WIP patch for parallel pg_dump

Started by Joachim Wielandabout 15 years ago80 messages
#1Joachim Wieland
joe@mcknight.de
1 attachment(s)

This is the second patch for parallel pg_dump, now the actual part that
parallelizes the whole thing. More precisely, it adds parallel
backup/restore
to pg_dump/pg_restore for the directory archive format and keeps the
parallel
restore part of the custom archive format. Combined with my archive format
directory patch, which also includes a prototype of the liblzf compression
you
can combine this compression with any of the just mentioned backup/restore
scenarios. This patch is on top of the previous directory patch.

You would add a regular parallel dump with

$ pg_dump -j 4 -Fd -f out.dir dbname

In previous discussions there was a request to add support for multiple
directories, which I have done as well, so that you can also run

$ pg_dump -j 4 -Fd -f dir1:dir2:dir3 dbname

to equally distribute the data among those three directories (we can still
discuss the syntax, I am not all that happy with the colon either...)

The dump would always start with the largest objects, by looking at the
relpages column of pg_class which should give a good estimate. The order of
the
objects to restore is determined by the dependencies among the objects
(which
is already used in the parallel restore of the custom archivetype).

The file test.sh includes some example commands that I have run here as a
kind
of regression test that should give you an impression of how to call it from
the
command line.

One thing that is currently missing is proper support for Windows, this is
the next
thing that I will be working on. Also this version still gives quite a bunch
of debug
information about what the processes are doing, so don't try to pipe the
pg_dump output anywhere (even when not run in parallel), it will probably
just
not work...

The missing part that would make parallel pg_dump work with no strings
attached
is snapshot synchronization. As long as there are no synchronized snapshots,
you would need to stop writing to your database before starting the parallel
pg_dump. However it turns out that most often when you are especially
concerned
about a fast dump, you have shut down your applications anyway (which is the
reason why you are so concerned about speed in the first place). These cases
are typically database migrations from one host/platform to another or
database
upgrades without pg_migrator.

Joachim

Attachments:

pg_dump-parallel.difftext/x-patch; charset=US-ASCII; name=pg_dump-parallel.diffDownload
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 5def7a7..984e700 100644
*** a/src/bin/pg_dump/pg_backup.h
--- b/src/bin/pg_dump/pg_backup.h
*************** extern void ArchiveEntry(Archive *AHX,
*** 171,177 ****
  			 CatalogId catalogId, DumpId dumpId,
  			 const char *tag,
  			 const char *namespace, const char *tablespace,
! 			 const char *owner, bool withOids,
  			 const char *desc, teSection section,
  			 const char *defn,
  			 const char *dropStmt, const char *copyStmt,
--- 171,178 ----
  			 CatalogId catalogId, DumpId dumpId,
  			 const char *tag,
  			 const char *namespace, const char *tablespace,
! 			 const char *owner,
! 			 unsigned long int relpages, bool withOids,
  			 const char *desc, teSection section,
  			 const char *defn,
  			 const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index c5b5fcc..e00505e 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 25,30 ****
--- 25,31 ----
  #include "compress_io.h"
  
  #include <ctype.h>
+ #include <fcntl.h>
  #include <unistd.h>
  #include <sys/stat.h>
  #include <sys/types.h>
***************
*** 44,86 ****
  #define WORKER_INHIBIT_DATA		11
  #define WORKER_IGNORED_ERRORS	12
  
- /*
-  * Unix uses exit to return result from worker child, so function is void.
-  * Windows thread result comes via function return.
-  */
- #ifndef WIN32
- #define parallel_restore_result void
- #else
- #define parallel_restore_result DWORD
- #endif
- 
- /* IDs for worker children are either PIDs or thread handles */
- #ifndef WIN32
- #define thandle pid_t
- #else
- #define thandle HANDLE
- #endif
- 
- /* Arguments needed for a worker child */
- typedef struct _restore_args
- {
- 	ArchiveHandle *AH;
- 	TocEntry   *te;
- } RestoreArgs;
- 
- /* State for each parallel activity slot */
- typedef struct _parallel_slot
- {
- 	thandle		child_id;
- 	RestoreArgs *args;
- } ParallelSlot;
- 
- #define NO_SLOT (-1)
- 
  const char *progname;
  
  static const char *modulename = gettext_noop("archiver");
  
  
  static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
  		 const int compression, ArchiveMode mode);
--- 45,56 ----
  #define WORKER_INHIBIT_DATA		11
  #define WORKER_IGNORED_ERRORS	12
  
  const char *progname;
  
  static const char *modulename = gettext_noop("archiver");
  
+ PGconn	  **g_conn_child;
+ PGconn	  *g_conn;
  
  static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
  		 const int compression, ArchiveMode mode);
*************** static void ResetOutput(ArchiveHandle *A
*** 119,139 ****
  
  static int restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
  				  RestoreOptions *ropt, bool is_parallel);
! static void restore_toc_entries_parallel(ArchiveHandle *AH);
! static thandle spawn_restore(RestoreArgs *args);
! static thandle reap_child(ParallelSlot *slots, int n_slots, int *work_status);
! static bool work_in_progress(ParallelSlot *slots, int n_slots);
! static int	get_next_slot(ParallelSlot *slots, int n_slots);
  static void par_list_header_init(TocEntry *l);
  static void par_list_append(TocEntry *l, TocEntry *te);
  static void par_list_remove(TocEntry *te);
  static TocEntry *get_next_work_item(ArchiveHandle *AH,
  				   TocEntry *ready_list,
! 				   ParallelSlot *slots, int n_slots);
! static parallel_restore_result parallel_restore(RestoreArgs *args);
  static void mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
! 			   thandle worker, int status,
! 			   ParallelSlot *slots, int n_slots);
  static void fix_dependencies(ArchiveHandle *AH);
  static bool has_lock_conflicts(TocEntry *te1, TocEntry *te2);
  static void repoint_table_dependencies(ArchiveHandle *AH,
--- 89,104 ----
  
  static int restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
  				  RestoreOptions *ropt, bool is_parallel);
! static void restore_toc_entries_parallel(ArchiveHandle *AH, ParallelState *pstate);
  static void par_list_header_init(TocEntry *l);
  static void par_list_append(TocEntry *l, TocEntry *te);
  static void par_list_remove(TocEntry *te);
  static TocEntry *get_next_work_item(ArchiveHandle *AH,
  				   TocEntry *ready_list,
! 				   ParallelState *pstate);
  static void mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
! 			   int worker, int status,
! 			   ParallelState *pstate);
  static void fix_dependencies(ArchiveHandle *AH);
  static bool has_lock_conflicts(TocEntry *te1, TocEntry *te2);
  static void repoint_table_dependencies(ArchiveHandle *AH,
*************** static void reduce_dependencies(ArchiveH
*** 145,153 ****
  					TocEntry *ready_list);
  static void mark_create_done(ArchiveHandle *AH, TocEntry *te);
  static void inhibit_data_for_failed_table(ArchiveHandle *AH, TocEntry *te);
- static ArchiveHandle *CloneArchive(ArchiveHandle *AH);
- static void DeCloneArchive(ArchiveHandle *AH);
  
  
  /*
   *	Wrapper functions.
--- 110,123 ----
  					TocEntry *ready_list);
  static void mark_create_done(ArchiveHandle *AH, TocEntry *te);
  static void inhibit_data_for_failed_table(ArchiveHandle *AH, TocEntry *te);
  
+ static void ListenToChildren(ArchiveHandle *AH, ParallelState *pstate, bool do_wait);
+ static void WaitForCommands(ArchiveHandle *AH, int, int);
+ static void PrintStatus(ParallelState *pstate);
+ static int GetIdleChild(ParallelState *pstate);
+ static int ReapChildStatus(ParallelState *pstate, int *status);
+ static bool HasEveryChildTerminated(ParallelState *pstate);
+ static bool IsEveryChildIdle(ParallelState *pstate);
  
  /*
   *	Wrapper functions.
*************** RestoreArchive(Archive *AHX, RestoreOpti
*** 245,251 ****
  	}
  #endif
  #ifndef HAVE_LIBLZF
- 	/* XXX are these checks correct?? */
  	if (AH->compression == COMPR_LZF_CODE && AH->PrintTocDataPtr !=NULL)
  	{
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
--- 215,220 ----
*************** RestoreArchive(Archive *AHX, RestoreOpti
*** 389,395 ****
  	 * In parallel mode, turn control over to the parallel-restore logic.
  	 */
  	if (ropt->number_of_jobs > 1 && ropt->useDB)
! 		restore_toc_entries_parallel(AH);
  	else
  	{
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
--- 358,370 ----
  	 * In parallel mode, turn control over to the parallel-restore logic.
  	 */
  	if (ropt->number_of_jobs > 1 && ropt->useDB)
! 	{
! 		ParallelState pstate;
! 		/* this will actually fork the processes */
! 		pstate = ParallelBackupStart(AH, ropt->number_of_jobs, ropt);
! 		restore_toc_entries_parallel(AH, &pstate);
! 		ParallelBackupEnd(AH, &pstate);
! 	}
  	else
  	{
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
*************** ArchiveEntry(Archive *AHX,
*** 728,734 ****
  			 const char *tag,
  			 const char *namespace,
  			 const char *tablespace,
! 			 const char *owner, bool withOids,
  			 const char *desc, teSection section,
  			 const char *defn,
  			 const char *dropStmt, const char *copyStmt,
--- 703,710 ----
  			 const char *tag,
  			 const char *namespace,
  			 const char *tablespace,
! 			 const char *owner,
! 			 unsigned long int relpages, bool withOids,
  			 const char *desc, teSection section,
  			 const char *defn,
  			 const char *dropStmt, const char *copyStmt,
*************** _discoverArchiveFormat(ArchiveHandle *AH
*** 1831,1839 ****
  			strcpy(buf, AH->fSpec);
  
  		fh = fopen(buf, PG_BINARY_R);
! 		if (!fh)
! 			die_horribly(AH, modulename, "could not open input file \"%s\": %s\n",
! 						 AH->fSpec, strerror(errno));
  	}
  	else
  	{
--- 1807,1821 ----
  			strcpy(buf, AH->fSpec);
  
  		fh = fopen(buf, PG_BINARY_R);
! 		if (!fh) {
! 			const char* dirhint = "";
! 			if (strchr(buf, ':'))
! 			{
! 				dirhint = _(" (for multiple directories, please use -Fd explicitly)");
! 			}
! 			die_horribly(AH, modulename, "could not open input file \"%s\": %s%s\n",
! 						 AH->fSpec, strerror(errno), dirhint);
! 		}
  	}
  	else
  	{
*************** _allocAH(const char *FileSpec, const Arc
*** 2065,2118 ****
  
  	return AH;
  }
- 
- 
  void
  WriteDataChunks(ArchiveHandle *AH)
  {
! 	TocEntry   *te;
! 	StartDataPtr startPtr;
! 	EndDataPtr	endPtr;
  
  	for (te = AH->toc->next; te != AH->toc; te = te->next)
  	{
! 		if (te->dataDumper != NULL)
  		{
! 			AH->currToc = te;
! 			/* printf("Writing data for %d (%x)\n", te->id, te); */
  
! 			if (strcmp(te->desc, "BLOBS") == 0)
! 			{
! 				startPtr = AH->StartBlobsPtr;
! 				endPtr = AH->EndBlobsPtr;
! 			}
! 			else
  			{
! 				startPtr = AH->StartDataPtr;
! 				endPtr = AH->EndDataPtr;
! 			}
  
! 			if (startPtr != NULL)
! 				(*startPtr) (AH, te);
  
! 			/*
! 			 * printf("Dumper arg for %d is %x\n", te->id, te->dataDumperArg);
! 			 */
  
! 			/*
! 			 * The user-provided DataDumper routine needs to call
! 			 * AH->WriteData
! 			 */
! 			(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
  
! 			if (endPtr != NULL)
! 				(*endPtr) (AH, te);
! 			AH->currToc = NULL;
  		}
  	}
  }
  
  void
  WriteToc(ArchiveHandle *AH)
  {
  	TocEntry   *te;
--- 2047,2161 ----
  
  	return AH;
  }
  void
  WriteDataChunks(ArchiveHandle *AH)
  {
! 	TocEntry	   *te;
! 	ParallelState  *pstate = NULL;
! 
! 	if (AH->GetParallelStatePtr)
! 		pstate = (AH->GetParallelStatePtr)(AH);
  
  	for (te = AH->toc->next; te != AH->toc; te = te->next)
  	{
! 		if (!te->hadDumper)
! 			continue;
! 
! 		printf("Dumping table %s (%d)\n", te->tag, te->dumpId);
! 		fflush(stdout);
! 		/*
! 		 * If we are in a parallel backup, we are always the master process.
! 		 */
! 		if (pstate)
  		{
! 			int		ret_child;
! 			int		work_status;
  
! 			for (;;)
  			{
! 				int nTerm = 0;
! 				while ((ret_child = ReapChildStatus(pstate, &work_status)) != NO_SLOT)
! 				{
! 					if (work_status != 0)
! 						die_horribly(AH, modulename, "Error processing a parallel work item\n");
  
! 					nTerm++;
! 				}
  
! 				/* We need to make sure that we have an idle child before dispatching 
! 				 * the next item. If nTerm > 0 we already have that (quick check). */
! 				if (nTerm > 0)
! 					break;
  
! 				/* explicit check for an idle child */
! 				if (GetIdleChild(pstate) != NO_SLOT)
! 					break;
  
! 				/*
! 				 * If we have no idle child, read the result of one or more
! 				 * children and loop the loop to call ReapChildStatus() on them
! 				 */
! 				ListenToChildren(AH, pstate, true);
! 			}
! 
! 			Assert(GetIdleChild(pstate) != NO_SLOT);
! 			DispatchJobForTocEntry(AH, pstate, te, ACT_DUMP);
! 		}
! 		else
! 		{
! 			WriteDataChunksForTocEntry(AH, te);
! 		}
! 	}
! 	if (pstate)
! 	{
! 		int		ret_child;
! 		int		work_status;
! 
! 		/* Waiting for the worker processes to finish */
! 		/* XXX "worker" vs "child" */
! 		while (!IsEveryChildIdle(pstate))
! 		{
! 			if ((ret_child = ReapChildStatus(pstate, &work_status)) == NO_SLOT)
! 				ListenToChildren(AH, pstate, true);
  		}
  	}
  }
  
  void
+ WriteDataChunksForTocEntry(ArchiveHandle *AH, TocEntry *te)
+ {
+ 	StartDataPtr startPtr;
+ 	EndDataPtr	endPtr;
+ 
+ 	AH->currToc = te;
+ 
+ 	if (strcmp(te->desc, "BLOBS") == 0)
+ 	{
+ 		startPtr = AH->StartBlobsPtr;
+ 		endPtr = AH->EndBlobsPtr;
+ 	}
+ 	else
+ 	{
+ 		startPtr = AH->StartDataPtr;
+ 		endPtr = AH->EndDataPtr;
+ 	}
+ 
+ 	if (startPtr != NULL)
+ 		(*startPtr) (AH, te);
+ 
+ 	/*
+ 	 * The user-provided DataDumper routine needs to call
+ 	 * AH->WriteData
+ 	 */
+ 	(*te->dataDumper) ((Archive *) AH, te->dataDumperArg);
+ 
+ 	if (endPtr != NULL)
+ 		(*endPtr) (AH, te);
+ 
+ 	AH->currToc = NULL;
+ }
+ 
+ void
  WriteToc(ArchiveHandle *AH)
  {
  	TocEntry   *te;
*************** dumpTimestamp(ArchiveHandle *AH, const c
*** 3269,3281 ****
   * entries in a single connection (that happens back in RestoreArchive).
   */
  static void
! restore_toc_entries_parallel(ArchiveHandle *AH)
  {
  	RestoreOptions *ropt = AH->ropt;
- 	int			n_slots = ropt->number_of_jobs;
- 	ParallelSlot *slots;
  	int			work_status;
- 	int			next_slot;
  	TocEntry	pending_list;
  	TocEntry	ready_list;
  	TocEntry   *next_work_item;
--- 3312,3321 ----
   * entries in a single connection (that happens back in RestoreArchive).
   */
  static void
! restore_toc_entries_parallel(ArchiveHandle *AH, ParallelState *pstate)
  {
  	RestoreOptions *ropt = AH->ropt;
  	int			work_status;
  	TocEntry	pending_list;
  	TocEntry	ready_list;
  	TocEntry   *next_work_item;
*************** restore_toc_entries_parallel(ArchiveHand
*** 3292,3299 ****
  	if (AH->version < K_VERS_1_8)
  		die_horribly(AH, modulename, "parallel restore is not supported with archives made by pre-8.0 pg_dump\n");
  
- 	slots = (ParallelSlot *) calloc(sizeof(ParallelSlot), n_slots);
- 
  	/* Adjust dependency information */
  	fix_dependencies(AH);
  
--- 3332,3337 ----
*************** restore_toc_entries_parallel(ArchiveHand
*** 3362,3368 ****
--- 3400,3409 ----
  			if (next_work_item->depCount > 0)
  				par_list_append(&pending_list, next_work_item);
  			else
+ 			{
+ 				printf("Appending %d to ready_list\n", next_work_item->dumpId);
  				par_list_append(&ready_list, next_work_item);
+ 			}
  		}
  	}
  
*************** restore_toc_entries_parallel(ArchiveHand
*** 3376,3383 ****
  	ahlog(AH, 1, "entering main parallel loop\n");
  
  	while ((next_work_item = get_next_work_item(AH, &ready_list,
! 												slots, n_slots)) != NULL ||
! 		   work_in_progress(slots, n_slots))
  	{
  		if (next_work_item != NULL)
  		{
--- 3417,3424 ----
  	ahlog(AH, 1, "entering main parallel loop\n");
  
  	while ((next_work_item = get_next_work_item(AH, &ready_list,
! 												pstate)) != NULL ||
! 		   !IsEveryChildIdle(pstate))
  	{
  		if (next_work_item != NULL)
  		{
*************** restore_toc_entries_parallel(ArchiveHand
*** 3397,3447 ****
  				continue;
  			}
  
! 			if ((next_slot = get_next_slot(slots, n_slots)) != NO_SLOT)
! 			{
! 				/* There is work still to do and a worker slot available */
! 				thandle		child;
! 				RestoreArgs *args;
! 
! 				ahlog(AH, 1, "launching item %d %s %s\n",
! 					  next_work_item->dumpId,
! 					  next_work_item->desc, next_work_item->tag);
  
! 				par_list_remove(next_work_item);
  
! 				/* this memory is dealloced in mark_work_done() */
! 				args = malloc(sizeof(RestoreArgs));
! 				args->AH = CloneArchive(AH);
! 				args->te = next_work_item;
  
! 				/* run the step in a worker child */
! 				child = spawn_restore(args);
  
! 				slots[next_slot].child_id = child;
! 				slots[next_slot].args = args;
  
! 				continue;
  			}
- 		}
  
! 		/*
! 		 * If we get here there must be work being done.  Either there is no
! 		 * work available to schedule (and work_in_progress returned true) or
! 		 * there are no slots available.  So we wait for a worker to finish,
! 		 * and process the result.
! 		 */
! 		ret_child = reap_child(slots, n_slots, &work_status);
  
! 		if (WIFEXITED(work_status))
! 		{
! 			mark_work_done(AH, &ready_list,
! 						   ret_child, WEXITSTATUS(work_status),
! 						   slots, n_slots);
! 		}
! 		else
! 		{
! 			die_horribly(AH, modulename, "worker process crashed: status %d\n",
! 						 work_status);
  		}
  	}
  
--- 3438,3496 ----
  				continue;
  			}
  
! 			ahlog(AH, 1, "launching item %d %s %s\n",
! 				  next_work_item->dumpId,
! 				  next_work_item->desc, next_work_item->tag);
  
! 			par_list_remove(next_work_item);
  
! 			Assert(GetIdleChild(pstate) != NO_SLOT);
! 			DispatchJobForTocEntry(AH, pstate, next_work_item, ACT_RESTORE);
! 		}
! 		else
! 		{
! 			/* at least one child is working and we have nothing ready. */
! 			Assert(!IsEveryChildIdle(pstate));
! 		}
  
! 		for (;;)
! 		{
! 			int nTerm = 0;
  
! 			/*
! 			 * In order to reduce dependencies as soon as possible and
! 			 * especially to reap the status of children who are working on
! 			 * items that pending items depend on, we do a non-blocking check
! 			 * for ended children first.
! 			 *
! 			 * However, if we do not have any other work items currently that
! 			 * children can work on, we do not busy-loop here but instead
! 			 * really wait for at least one child to terminate. Hence we call
! 			 * ListenToChildren(..., ..., true) in this case.
! 			 */
! 			ListenToChildren(AH, pstate, !next_work_item);
  
! 			while ((ret_child = ReapChildStatus(pstate, &work_status)) != NO_SLOT)
! 			{
! 				nTerm++;
! 				printf("Marking the child's work as done\n");
! 				mark_work_done(AH, &ready_list, ret_child, work_status, pstate);
  			}
  
! 			/* We need to make sure that we have an idle child before re-running the
! 			 * loop. If nTerm > 0 we already have that (quick check). */
! 			if (nTerm > 0)
! 				break;
  
! 			/* explicit check for an idle child */
! 			if (GetIdleChild(pstate) != NO_SLOT)
! 				break;
! 
! 			/*
! 			 * If we have no idle child, read the result of one or more
! 			 * children and loop the loop to call ReapChildStatus() on them
! 			 */
! 			ListenToChildren(AH, pstate, true);
  		}
  	}
  
*************** restore_toc_entries_parallel(ArchiveHand
*** 3474,3499 ****
  /*
   * create a worker child to perform a restore step in parallel
   */
  static thandle
! spawn_restore(RestoreArgs *args)
  {
! 	thandle		child;
! 
! 	/* Ensure stdio state is quiesced before forking */
! 	fflush(NULL);
  
  #ifndef WIN32
  	child = fork();
  	if (child == 0)
  	{
! 		/* in child process */
  		parallel_restore(args);
  		die_horribly(args->AH, modulename,
  					 "parallel_restore should not return\n");
  	}
  	else if (child < 0)
  	{
! 		/* fork failed */
  		die_horribly(args->AH, modulename,
  					 "could not create worker process: %s\n",
  					 strerror(errno));
--- 3523,3546 ----
  /*
   * create a worker child to perform a restore step in parallel
   */
+ /*
  static thandle
! spawn_restore(ParallelArgs *args)
  {
! 	DispatchJobForTocEntry(args->AH, args->te);
  
  #ifndef WIN32
  	child = fork();
  	if (child == 0)
  	{
! 		/+ in child process +/
  		parallel_restore(args);
  		die_horribly(args->AH, modulename,
  					 "parallel_restore should not return\n");
  	}
  	else if (child < 0)
  	{
! 		/+ fork failed +/
  		die_horribly(args->AH, modulename,
  					 "could not create worker process: %s\n",
  					 strerror(errno));
*************** spawn_restore(RestoreArgs *args)
*** 3509,3589 ****
  
  	return child;
  }
! 
! /*
!  *	collect status from a completed worker child
!  */
! static thandle
! reap_child(ParallelSlot *slots, int n_slots, int *work_status)
! {
! #ifndef WIN32
! 	/* Unix is so much easier ... */
! 	return wait(work_status);
! #else
! 	static HANDLE *handles = NULL;
! 	int			hindex,
! 				snum,
! 				tnum;
! 	thandle		ret_child;
! 	DWORD		res;
! 
! 	/* first time around only, make space for handles to listen on */
! 	if (handles == NULL)
! 		handles = (HANDLE *) calloc(sizeof(HANDLE), n_slots);
! 
! 	/* set up list of handles to listen to */
! 	for (snum = 0, tnum = 0; snum < n_slots; snum++)
! 		if (slots[snum].child_id != 0)
! 			handles[tnum++] = slots[snum].child_id;
! 
! 	/* wait for one to finish */
! 	hindex = WaitForMultipleObjects(tnum, handles, false, INFINITE);
! 
! 	/* get handle of finished thread */
! 	ret_child = handles[hindex - WAIT_OBJECT_0];
! 
! 	/* get the result */
! 	GetExitCodeThread(ret_child, &res);
! 	*work_status = res;
! 
! 	/* dispose of handle to stop leaks */
! 	CloseHandle(ret_child);
! 
! 	return ret_child;
! #endif
! }
! 
! /*
!  * are we doing anything now?
!  */
! static bool
! work_in_progress(ParallelSlot *slots, int n_slots)
! {
! 	int			i;
! 
! 	for (i = 0; i < n_slots; i++)
! 	{
! 		if (slots[i].child_id != 0)
! 			return true;
! 	}
! 	return false;
! }
! 
! /*
!  * find the first free parallel slot (if any).
!  */
! static int
! get_next_slot(ParallelSlot *slots, int n_slots)
! {
! 	int			i;
! 
! 	for (i = 0; i < n_slots; i++)
! 	{
! 		if (slots[i].child_id == 0)
! 			return i;
! 	}
! 	return NO_SLOT;
! }
  
  
  /*
--- 3556,3562 ----
  
  	return child;
  }
! */
  
  
  /*
*************** par_list_remove(TocEntry *te)
*** 3659,3665 ****
   */
  static TocEntry *
  get_next_work_item(ArchiveHandle *AH, TocEntry *ready_list,
! 				   ParallelSlot *slots, int n_slots)
  {
  	bool		pref_non_data = false;	/* or get from AH->ropt */
  	TocEntry   *data_te = NULL;
--- 3632,3638 ----
   */
  static TocEntry *
  get_next_work_item(ArchiveHandle *AH, TocEntry *ready_list,
! 				   ParallelState *pstate)
  {
  	bool		pref_non_data = false;	/* or get from AH->ropt */
  	TocEntry   *data_te = NULL;
*************** get_next_work_item(ArchiveHandle *AH, To
*** 3670,3684 ****
  	/*
  	 * Bogus heuristics for pref_non_data
  	 */
  	if (pref_non_data)
  	{
  		int			count = 0;
  
! 		for (k = 0; k < n_slots; k++)
! 			if (slots[k].args->te != NULL &&
! 				slots[k].args->te->section == SECTION_DATA)
  				count++;
! 		if (n_slots == 0 || count * 4 < n_slots)
  			pref_non_data = false;
  	}
  
--- 3643,3658 ----
  	/*
  	 * Bogus heuristics for pref_non_data
  	 */
+ 	/* XXX */
  	if (pref_non_data)
  	{
  		int			count = 0;
  
! 		for (k = 0; k < pstate->numWorkers; k++)
! 			if (pstate->parallelSlot[k].args->te != NULL &&
! 				pstate->parallelSlot[k].args->te->section == SECTION_DATA)
  				count++;
! 		if (pstate->numWorkers == 0 || count * 4 < pstate->numWorkers)
  			pref_non_data = false;
  	}
  
*************** get_next_work_item(ArchiveHandle *AH, To
*** 3694,3710 ****
  		 * that a currently running item also needs lock on, or vice versa. If
  		 * so, we don't want to schedule them together.
  		 */
! 		for (i = 0; i < n_slots && !conflicts; i++)
  		{
  			TocEntry   *running_te;
  
! 			if (slots[i].args == NULL)
  				continue;
! 			running_te = slots[i].args->te;
  
  			if (has_lock_conflicts(te, running_te) ||
  				has_lock_conflicts(running_te, te))
  			{
  				conflicts = true;
  				break;
  			}
--- 3668,3685 ----
  		 * that a currently running item also needs lock on, or vice versa. If
  		 * so, we don't want to schedule them together.
  		 */
! 		for (i = 0; i < pstate->numWorkers && !conflicts; i++)
  		{
  			TocEntry   *running_te;
  
! 			if (pstate->parallelSlot[i].ChildStatus != CS_WORKING)
  				continue;
! 			running_te = pstate->parallelSlot[i].args->te;
  
  			if (has_lock_conflicts(te, running_te) ||
  				has_lock_conflicts(running_te, te))
  			{
+ 				printf("lock conflicts detected. %d (want to schedule) with %d (running). i: %d. status: %d!!!\n", te->dumpId, running_te->dumpId, i, pstate->parallelSlot[i].ChildStatus);
  				conflicts = true;
  				break;
  			}
*************** get_next_work_item(ArchiveHandle *AH, To
*** 3738,3745 ****
   * this is the procedure run as a thread (Windows) or a
   * separate process (everything else).
   */
! static parallel_restore_result
! parallel_restore(RestoreArgs *args)
  {
  	ArchiveHandle *AH = args->AH;
  	TocEntry   *te = args->te;
--- 3713,3720 ----
   * this is the procedure run as a thread (Windows) or a
   * separate process (everything else).
   */
! parallel_restore_result
! parallel_restore(ParallelArgs *args)
  {
  	ArchiveHandle *AH = args->AH;
  	TocEntry   *te = args->te;
*************** parallel_restore(RestoreArgs *args)
*** 3759,3795 ****
  		(AH->ReopenPtr) (AH);
  #ifndef WIN32
  	else
! 		(AH->ClosePtr) (AH);
  #endif
  
- 	/*
- 	 * We need our own database connection, too
- 	 */
- 	ConnectDatabase((Archive *) AH, ropt->dbname,
- 					ropt->pghost, ropt->pgport, ropt->username,
- 					ropt->promptPassword);
- 
  	_doSetFixedOutputState(AH);
  
  	/* Restore the TOC item */
  	retval = restore_toc_entry(AH, te, ropt, true);
  
  	/* And clean up */
- 	PQfinish(AH->connection);
- 	AH->connection = NULL;
  
  	/* If we reopened the file, we are done with it, so close it now */
  	if (te->section == SECTION_DATA)
  		(AH->ClosePtr) (AH);
  
  	if (retval == 0 && AH->public.n_errors)
  		retval = WORKER_IGNORED_ERRORS;
  
- #ifndef WIN32
- 	exit(retval);
- #else
  	return retval;
- #endif
  }
  
  
--- 3734,3764 ----
  		(AH->ReopenPtr) (AH);
  #ifndef WIN32
  	else
! 	{
! 		if (AH->FH)
! 			(AH->ClosePtr) (AH);
! 	}
  #endif
  
  	_doSetFixedOutputState(AH);
  
+ 	Assert(AH->connection != NULL);
+ 
  	/* Restore the TOC item */
  	retval = restore_toc_entry(AH, te, ropt, true);
  
  	/* And clean up */
  
  	/* If we reopened the file, we are done with it, so close it now */
+ 	/* XXX
  	if (te->section == SECTION_DATA)
  		(AH->ClosePtr) (AH);
+ 	*/
  
  	if (retval == 0 && AH->public.n_errors)
  		retval = WORKER_IGNORED_ERRORS;
  
  	return retval;
  }
  
  
*************** parallel_restore(RestoreArgs *args)
*** 3801,3825 ****
   */
  static void
  mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
! 			   thandle worker, int status,
! 			   ParallelSlot *slots, int n_slots)
  {
  	TocEntry   *te = NULL;
- 	int			i;
- 
- 	for (i = 0; i < n_slots; i++)
- 	{
- 		if (slots[i].child_id == worker)
- 		{
- 			slots[i].child_id = 0;
- 			te = slots[i].args->te;
- 			DeCloneArchive(slots[i].args->AH);
- 			free(slots[i].args);
- 			slots[i].args = NULL;
  
! 			break;
! 		}
! 	}
  
  	if (te == NULL)
  		die_horribly(AH, modulename, "could not find slot of finished worker\n");
--- 3770,3785 ----
   */
  static void
  mark_work_done(ArchiveHandle *AH, TocEntry *ready_list,
! 			   int worker, int status,
! 			   ParallelState *pstate)
  {
  	TocEntry   *te = NULL;
  
! 	te = pstate->parallelSlot[worker].args->te;
! 	/* XXX */
! 	//DeCloneArchive(pstate->parallelSlot[worker].args->AH);
! 	//free(pstate->parallelSlot[worker].args);
! 	//pstate->parallelSlot[worker].args = NULL;
  
  	if (te == NULL)
  		die_horribly(AH, modulename, "could not find slot of finished worker\n");
*************** inhibit_data_for_failed_table(ArchiveHan
*** 4144,4153 ****
   *
   * Enough of the structure is cloned to ensure that there is no
   * conflict between different threads each with their own clone.
-  *
-  * These could be public, but no need at present.
   */
! static ArchiveHandle *
  CloneArchive(ArchiveHandle *AH)
  {
  	ArchiveHandle *clone;
--- 4104,4111 ----
   *
   * Enough of the structure is cloned to ensure that there is no
   * conflict between different threads each with their own clone.
   */
! ArchiveHandle *
  CloneArchive(ArchiveHandle *AH)
  {
  	ArchiveHandle *clone;
*************** CloneArchive(ArchiveHandle *AH)
*** 4188,4194 ****
   *
   * Note: we assume any clone-local connection was already closed.
   */
! static void
  DeCloneArchive(ArchiveHandle *AH)
  {
  	/* Clear format-specific state */
--- 4146,4152 ----
   *
   * Note: we assume any clone-local connection was already closed.
   */
! void
  DeCloneArchive(ArchiveHandle *AH)
  {
  	/* Clear format-specific state */
*************** DeCloneArchive(ArchiveHandle *AH)
*** 4212,4214 ****
--- 4170,4683 ----
  
  	free(AH);
  }
+ 
+ ParallelState
+ ParallelBackupStart(ArchiveHandle *AH, int numWorkers, RestoreOptions *ropt)
+ {
+ 	ParallelState	pstate;
+ 	int				i;
+ 
+ 	/* Ensure stdio state is quiesced before forking */
+ 	fflush(NULL);
+ 
+ 	Assert(numWorkers > 0);
+ 
+ 	memset((void *) &pstate, 0, sizeof(ParallelState));
+ 
+ 	pstate.numWorkers = numWorkers;
+ 
+ 	if (numWorkers == 1)
+ 		return pstate;
+ 
+ 	pstate.pipeWorkerRead = (int *) malloc(numWorkers * sizeof(int));
+ 	pstate.pipeWorkerWrite = (int *) malloc(numWorkers * sizeof(int));
+ 	pstate.parallelSlot = (ParallelSlot *) malloc(numWorkers * sizeof(ParallelSlot));
+ 
+ 	for (i = 0; i < numWorkers; i++)
+ 	{
+ 		int		pipeMW[2], pipeWM[2];
+ 		pid_t	pid;
+ 
+ 		if (pipe(pipeMW) < 0 || pipe(pipeWM) < 0)
+ 			die_horribly(AH, modulename, "Cannot create communication channels: %s",
+ 						 strerror(errno));
+ 		pid = fork();
+ 		if (pid == 0)
+ 		{
+ 			/* we are the worker */
+ 			close(pipeWM[0]);	/* close read end of Worker -> Master */
+ 			close(pipeMW[1]);	/* close write end of Master -> Worker */
+ 
+ 			free(pstate.pipeWorkerRead);
+ 			pstate.pipeWorkerRead = NULL;
+ 			free(pstate.pipeWorkerWrite);
+ 			pstate.pipeWorkerWrite = NULL;
+ 			free(pstate.parallelSlot);
+ 			pstate.parallelSlot = NULL;
+ 
+ 			if (ropt)
+ 			{
+ 				/*
+ 				 * Restore mode - We need our own database connection, too
+ 				 */
+ 				AH->connection = NULL;
+ 				printf("Connecting: Db: %s host %s port %s user %s\n", ropt->dbname,
+ 								ropt->pghost, ropt->pgport, ropt->username);
+ 
+ 				ConnectDatabase((Archive *) AH, ropt->dbname,
+ 								ropt->pghost, ropt->pgport, ropt->username,
+ 								ropt->promptPassword);
+ 
+ 				g_conn = AH->connection;
+ 			}
+ 			else
+ 			{
+ 				/*
+ 				 * Dump mode - The parent has opened our connection
+ 				 */
+ 				if (g_conn_child)
+ 					g_conn = AH->connection = g_conn_child[i];
+ 			}
+ 
+ 			free(g_conn_child);
+ 			g_conn_child = NULL;
+ 
+ 			Assert(AH->connection != NULL);
+ 			Assert(g_conn != NULL);
+ 
+ 			/* the worker will never return from this function */
+ 			WaitForCommands(AH, pipeMW[0], pipeWM[1]);
+ 		}
+ 		else
+ 		{
+ 			/* we are the Master */
+ 			close(pipeWM[1]);	/* close write end of Worker -> Master */
+ 			close(pipeMW[0]);	/* close read end of Master -> Worker */
+ 
+ 			pstate.pipeWorkerRead[i] = pipeWM[0];
+ 			pstate.pipeWorkerWrite[i] = pipeMW[1];
+ 
+ 			pstate.parallelSlot[i].args = (ParallelArgs *) malloc(sizeof(ParallelArgs));
+ 			pstate.parallelSlot[i].args->AH = AH;
+ 			pstate.parallelSlot[i].args->te = NULL;
+ 			pstate.parallelSlot[i].ChildStatus = CS_IDLE;
+ 		}
+ 	}
+ 	return pstate;
+ }
+ 
+ void
+ ParallelBackupEnd(ArchiveHandle *AH, ParallelState *pstate)
+ {
+ 	int i;
+ 
+ 	if (pstate->numWorkers == 1)
+ 		return;
+ 
+ 	Assert(IsEveryChildIdle(pstate));
+ 	printf("Asking children to terminate\n");
+ 
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		int ret;
+ 		printf("Asking child %d to terminate\n", i);
+ 		ret = write(pstate->pipeWorkerWrite[i], "TERMINATE", strlen("TERMINATE") + 1);
+ 		pstate->parallelSlot[i].ChildStatus = CS_WORKING;
+ 	}
+ 
+ 	while (!HasEveryChildTerminated(pstate))
+ 	{
+ 		ListenToChildren(AH, pstate, true);
+ 	}
+ 
+ 	PrintStatus(pstate);
+ 
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		close(pstate->pipeWorkerRead[i]);
+ 		close(pstate->pipeWorkerWrite[i]);
+ 	}
+ }
+ 
+ 
+ /*
+  * The sequence is the following (for dump, similar for restore):
+  *
+  * Master                                   Worker
+  *
+  *                                          enters WaitForCommands()
+  * DispatchJobForTocEntry(...te...)
+  *
+  * [ Worker is IDLE ]
+  *
+  * arg = (StartMasterParallelPtr)()
+  * send: DUMP arg
+  *                                          receive: DUMP arg
+  *                                          str = (WorkerJobDumpPtr)(arg)
+  * [ Worker is WORKING ]                    ... gets te from arg ...
+  *                                          ... dump te ...
+  *                                          send: OK DUMP info
+  *
+  * In ListenToChildren():
+  *
+  * [ Worker is FINISHED ]
+  * receive: OK DUMP info
+  * status = (EndMasterParallelPtr)(info)
+  *
+  * In ReapChildStatus(&ptr):
+  * *ptr = status;
+  * [ Worker is IDLE ]
+  */
+ 
+ void
+ DispatchJobForTocEntry(ArchiveHandle *AH, ParallelState *pstate, TocEntry *te,
+ 					   T_Action act)
+ {
+ 	int		worker;
+ 	char   *arg;
+ 	int		len;
+ 
+ 	Assert(GetIdleChild(pstate) != NO_SLOT);
+ 
+ 	/* our caller must make sure that at least one child is idle */
+ 	worker = GetIdleChild(pstate);
+ 	Assert(worker != NO_SLOT);
+ 
+ 	arg = (AH->StartMasterParallelPtr)(AH, te, act);
+ 	len = strlen(arg) + 1;
+ 	if (write(pstate->pipeWorkerWrite[worker], arg, len) != len)
+ 		die_horribly(AH, modulename,
+ 					 "Error writing to the communication channel: %s",
+ 					 strerror(errno));
+ 	pstate->parallelSlot[worker].ChildStatus = CS_WORKING;
+ 	pstate->parallelSlot[worker].args->te = te;
+ 	PrintStatus(pstate);
+ }
+ 
+ 
+ static void
+ PrintStatus(ParallelState *pstate)
+ {
+ 	int i;
+ 	printf("------Status------\n");
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		printf("Status of child %d: ", i);
+ 		switch (pstate->parallelSlot[i].ChildStatus)
+ 		{
+ 			case CS_IDLE:
+ 				printf("IDLE");
+ 				break;
+ 			case CS_WORKING:
+ 				printf("WORKING");
+ 				break;
+ 			case CS_FINISHED:
+ 				printf("FINISHED");
+ 				break;
+ 			case CS_TERMINATED:
+ 				printf("TERMINATED");
+ 				break;
+ 		}
+ 		printf("\n");
+ 	}
+ 	printf("------------\n");
+ }
+ 
+ 
+ /*
+  * find the first free parallel slot (if any).
+  */
+ static int
+ GetIdleChild(ParallelState *pstate)
+ {
+ 	int i;
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		if (pstate->parallelSlot[i].ChildStatus == CS_IDLE)
+ 			return i;
+ 	}
+ 	return NO_SLOT;
+ }
+ 
+ static bool
+ HasEveryChildTerminated(ParallelState *pstate)
+ {
+ 	int i;
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		if (pstate->parallelSlot[i].ChildStatus != CS_TERMINATED)
+ 			return false;
+ 	}
+ 	return true;
+ }
+ 
+ static bool
+ IsEveryChildIdle(ParallelState *pstate)
+ {
+ 	int i;
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		if (pstate->parallelSlot[i].ChildStatus != CS_IDLE)
+ 			return false;
+ 	}
+ 	return true;
+ }
+ 
+ static char *
+ readMessageFromPipe(int fd, bool allowBlock)
+ {
+ 	static char	   *buf;
+ 	static int		bufsize = 0;
+ 	char		   *msg;
+ 	int				msgsize;
+ 	int				ret;
+ 	int				flags;
+ 
+ 	/*
+ 	 * The problem here is that we need to deal with several possibilites:
+ 	 * we could receive only a partial message or several messages at once.
+ 	 * The caller expects us to return exactly one message however.
+ 	 *
+ 	 * We could either read in as much as we can and keep track of what we
+ 	 * delivered back to the caller or we just read byte by byte. Once we see
+ 	 * (char) 0, we know that it's the message's end. This is quite inefficient
+ 	 * but since we are reading only on the command channel, the performance
+ 	 * loss does not seem worth the trouble of keeping internal states for
+ 	 * different file descriptors.
+ 	 */
+ 
+ 	if (bufsize == 0)
+ 	{
+ 		buf = (char *) malloc(1);
+ 		bufsize = 1;
+ 	}
+ 
+ 	msg = buf;
+ 	msgsize = 0;
+ 
+ 
+ 	for (;;)
+ 	{
+ 		if (msgsize == 0 && !allowBlock)
+ 		{
+ 			flags = fcntl(fd, F_GETFL, 0);
+ 			fcntl(fd, F_SETFL, flags | O_NONBLOCK);
+ 		}
+ 
+ 		ret = read(fd, msg + msgsize, 1);
+ 
+ 		if (msgsize == 0 && !allowBlock)
+ 		{
+ 			int		saved_errno = errno;
+ 			fcntl(fd, F_SETFL, flags);
+ 			if (ret < 0 && saved_errno == EAGAIN)
+ 				return NULL;
+ 		}
+ 
+ 		if (ret == 0)
+ 		{
+ 			/* child has closed the connection */
+ 			write_msg(NULL, "the communication partner died\n");
+ 			exit(1);
+ 		}
+ 		if (ret < 0)
+ 		{
+ 			write_msg(NULL, "error reading from communication partner: %s\n",
+ 					  strerror(errno));
+ 			exit(1);
+ 		}
+ 
+ 		if (msg[msgsize] == '\0')
+ 			return msg;
+ 
+ 		msgsize++;
+ 		if (msgsize == bufsize)
+ 		{
+ 			bufsize += 10;
+ 			buf = (char *) realloc(buf, bufsize);
+ 			msg = buf;
+ 		}
+ 	}
+ }
+ 
+ 
+ #define messageStartsWith(msg, prefix) \
+ 	(strncmp(msg, prefix, strlen(prefix)) == 0)
+ #define messageEquals(msg, pattern) \
+ 	(strcmp(msg, pattern) == 0)
+ static void
+ WaitForCommands(ArchiveHandle *AH, int rfd, int wfd)
+ {
+ 	char   *command;
+ 	char   *str = NULL;
+ 	int		len;
+ 	bool	shouldExit = false;
+ 
+ 	for(;;)
+ 	{
+ 		command = readMessageFromPipe(rfd, true);
+ 		printf("Read command: %s in pid %d\n", command, getpid());
+ 		fflush(stdout);
+ 		if (messageStartsWith(command, "DUMP "))
+ 		{
+ 			Assert(AH->format == archDirectory);
+ 
+ 			str = (AH->WorkerJobDumpPtr)(AH, command + strlen("DUMP "));
+ 		}
+ 		else if (messageStartsWith(command, "RESTORE "))
+ 		{
+ 			Assert(AH->format == archDirectory || AH->format == archCustom);
+ 			Assert(AH->connection != NULL);
+ 
+ 			str = (AH->WorkerJobRestorePtr)(AH, command + strlen("RESTORE "));
+ 
+ 			Assert(AH->connection != NULL);
+ 		}
+ 		else if (messageEquals(command, "TERMINATE"))
+ 		{
+ 			printf("Terminating in %d\n", getpid());
+ 			PQfinish(AH->connection);
+ 			close(rfd);
+ 			str = "TERMINATE OK";
+ 			shouldExit = true;
+ 		}
+ 		else
+ 		{
+ 			die_horribly(AH, modulename,
+ 						 "Unknown command on communication channel: %s", command);
+ 		}
+ 		len = strlen(str) + 1;
+ 		if (write(wfd, str, len) != len)
+ 			die_horribly(AH, modulename,
+ 						 "Error writing to the communication channel: %s",
+ 						 strerror(errno));
+ 		if (shouldExit)
+ 		{
+ 			close(wfd);
+ 			exit(0);
+ 		}
+ 	}
+ }
+ 
+ 
+ /*
+  * Note the status change:
+  *
+  * DispatchJobForTocEntry		CS_IDLE -> CS_WORKING
+  * ListenToChildren				CS_WORKING -> CS_FINISHED / CS_TERMINATED
+  * ReapChildStatus				CS_FINISHED -> CS_IDLE
+  *
+  * Just calling ReapChildStatus() when all children are working might or might
+  * not give you an idle child because you need to call ListenToChildren() in
+  * between and only thereafter ReapChildStatus(). This is necessary in order to
+  * get and deal with the status (=result) of the child's execution.
+  */
+ static void
+ ListenToChildren(ArchiveHandle *AH, ParallelState *pstate, bool do_wait)
+ {
+ 	int			i;
+ 	fd_set		childset;
+ 	int			maxFd = -1;
+ 	struct		timeval nowait = { 0, 0 };
+ 
+ 	FD_ZERO(&childset);
+ 
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		if (pstate->parallelSlot[i].ChildStatus == CS_TERMINATED)
+ 			continue;
+ 		FD_SET(pstate->pipeWorkerRead[i], &childset);
+ 		if (pstate->pipeWorkerRead[i] > maxFd)
+ 			maxFd = pstate->pipeWorkerRead[i];
+ 	}
+ 
+ 	if (do_wait)
+ 	{
+ 		i = select(maxFd + 1, &childset, NULL, NULL, NULL);  /* no timeout */
+ 		Assert(i != 0);
+ 	}
+ 	else
+ 	{
+ 		if ((i = select(maxFd + 1, &childset, NULL, NULL, &nowait)) == 0)
+ 			return;
+ 	}
+ 
+ 	if (i < 0)
+ 	{
+ 		/* XXX Could there be a valid signal like SIGINT ? */
+ 		write_msg(NULL, "Error in ListenToChildren(): %s", strerror(errno));
+ 		exit(1);
+ 	}
+ 
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		char	   *msg;
+ 
+ 		if (!FD_ISSET(pstate->pipeWorkerRead[i], &childset))
+ 			continue;
+ 
+ 		while ((msg = readMessageFromPipe(pstate->pipeWorkerRead[i], false)))
+ 		{
+ 			if (messageStartsWith(msg, "OK "))
+ 			{
+ 				char *statusString;
+ 				TocEntry *te;
+ 
+ 				printf("Got OK with information from child %d (%s)\n", i, msg);
+ 
+ 				pstate->parallelSlot[i].ChildStatus = CS_FINISHED;
+ 				te = pstate->parallelSlot[i].args->te;
+ 				if (messageStartsWith(msg, "OK RESTORE "))
+ 				{
+ 					statusString = msg + strlen("OK RESTORE ");
+ 					pstate->parallelSlot[i].status =
+ 						(AH->EndMasterParallelPtr)
+ 							(AH, te, statusString, ACT_RESTORE);
+ 				}
+ 				else if (messageStartsWith(msg, "OK DUMP "))
+ 				{
+ 					statusString = msg + strlen("OK DUMP ");
+ 					pstate->parallelSlot[i].status =
+ 						(AH->EndMasterParallelPtr)
+ 							(AH, te, statusString, ACT_DUMP);
+ 				}
+ 				else
+ 					die_horribly(AH, modulename, "Invalid message received from child: %s", msg);
+ 			}
+ 			else if (messageStartsWith(msg, "TERMINATE OK"))
+ 			{
+ 				/* this child is idle again */
+ 				printf("Child %d has terminated\n", i);
+ 				pstate->parallelSlot[i].ChildStatus = CS_TERMINATED;
+ 				pstate->parallelSlot[i].status = 0;
+ 				/* do not read again from this fd, it will fail. */
+ 				break;
+ 			}
+ 			else
+ 			{
+ 				die_horribly(AH, modulename, "Invalid message received from child: %s", msg);
+ 			}
+ 			PrintStatus(pstate);
+ 		}
+ 	}
+ }
+ 
+ static int
+ ReapChildStatus(ParallelState *pstate, int *status)
+ {
+ 	int i;
+ 
+ 	for (i = 0; i < pstate->numWorkers; i++)
+ 	{
+ 		if (pstate->parallelSlot[i].ChildStatus == CS_FINISHED)
+ 		{
+ 			*status = pstate->parallelSlot[i].status;
+ 			pstate->parallelSlot[i].status = 0;
+ 			pstate->parallelSlot[i].ChildStatus = CS_IDLE;
+ 			PrintStatus(pstate);
+ 			return i;
+ 		}
+ 	}
+ 	return NO_SLOT;
+ }
+ 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 9eb9f6f..62274f6 100644
*** a/src/bin/pg_dump/pg_backup_archiver.h
--- b/src/bin/pg_dump/pg_backup_archiver.h
*************** typedef z_stream *z_streamp;
*** 112,117 ****
--- 112,119 ----
  struct _archiveHandle;
  struct _tocEntry;
  struct _restoreList;
+ enum _teReqs;
+ enum _action;
  
  typedef enum
  {
*************** typedef void (*PrintExtraTocPtr) (struct
*** 144,149 ****
--- 146,155 ----
  typedef void (*PrintTocDataPtr) (struct _archiveHandle * AH, struct _tocEntry * te, RestoreOptions *ropt);
  typedef void (*PrintExtraTocSummaryPtr) (struct _archiveHandle * AH);
  
+ /* XXX order similar to below */
+ typedef char *(*WorkerJobRestorePtr)(struct _archiveHandle * AH, const char *args);
+ typedef char *(*WorkerJobDumpPtr)(struct _archiveHandle * AH, const char *args);
+ 
  typedef void (*ClonePtr) (struct _archiveHandle * AH);
  typedef void (*DeClonePtr) (struct _archiveHandle * AH);
  
*************** typedef bool (*StartCheckArchivePtr)(str
*** 151,156 ****
--- 157,166 ----
  typedef bool (*CheckTocEntryPtr)(struct _archiveHandle * AH, struct _tocEntry * te, teReqs reqs);
  typedef bool (*EndCheckArchivePtr)(struct _archiveHandle * AH);
  
+ typedef struct _parallel_state *(*GetParallelStatePtr)(struct _archiveHandle * AH);
+ typedef char *(*StartMasterParallelPtr)(struct _archiveHandle * AH, struct _tocEntry * te, enum _action act);
+ typedef int (*EndMasterParallelPtr)(struct _archiveHandle * AH, struct _tocEntry * te, const char *str, enum _action act);
+ 
  typedef size_t (*CustomOutPtr) (struct _archiveHandle * AH, const void *buf, size_t len);
  
  typedef struct _outputContext
*************** typedef struct
*** 181,187 ****
  	int			minTagEndPos;	/* first possible end position of $-quote */
  } sqlparseInfo;
  
! typedef enum
  {
  	STAGE_NONE = 0,
  	STAGE_INITIALIZING,
--- 191,197 ----
  	int			minTagEndPos;	/* first possible end position of $-quote */
  } sqlparseInfo;
  
! typedef enum _teReqs
  {
  	STAGE_NONE = 0,
  	STAGE_INITIALIZING,
*************** typedef struct _archiveHandle
*** 251,256 ****
--- 261,273 ----
  	StartBlobPtr StartBlobPtr;
  	EndBlobPtr EndBlobPtr;
  
+ 	StartMasterParallelPtr StartMasterParallelPtr;
+ 	EndMasterParallelPtr EndMasterParallelPtr;
+ 
+ 	GetParallelStatePtr GetParallelStatePtr;
+ 	WorkerJobDumpPtr WorkerJobDumpPtr;
+ 	WorkerJobRestorePtr WorkerJobRestorePtr;
+ 
  	ClonePtr ClonePtr;			/* Clone format-specific fields */
  	DeClonePtr DeClonePtr;		/* Clean up cloned fields */
  
*************** typedef struct _tocEntry
*** 350,355 ****
--- 367,439 ----
  	int			nLockDeps;		/* number of such dependencies */
  } TocEntry;
  
+ /* IDs for worker children are either PIDs or thread handles */
+ #ifndef WIN32
+ #define thandle pid_t
+ #else
+ #define thandle HANDLE
+ #endif
+ 
+ typedef enum
+ {
+ 	/* XXX move */
+    CS_IDLE,
+    CS_WORKING,
+    CS_FINISHED,
+    CS_TERMINATED
+ } T_ChildStatus;
+ 
+ typedef enum _action
+ {
+ 	ACT_DUMP,
+ 	ACT_RESTORE,
+ } T_Action;
+ 
+ /* Arguments needed for a worker child */
+ typedef struct _parallel_args
+ {
+ 	ArchiveHandle	   *AH;
+ 	TocEntry		   *te;
+ } ParallelArgs;
+ 
+ /* State for each parallel activity slot */
+ typedef struct _parallel_slot
+ {
+ 	thandle				child_id;
+ 	ParallelArgs	   *args;
+ 	T_ChildStatus		ChildStatus;
+ 	int					status;
+ } ParallelSlot;
+ 
+ #define NO_SLOT (-1)
+ 
+ typedef struct _parallel_state
+ {
+ 	int numWorkers;
+ 	int *pipeWorkerRead;
+ 	int *pipeWorkerWrite;
+ 	ParallelSlot *parallelSlot;
+ } ParallelState;
+ 
+ /*
+  * Unix uses exit to return result from worker child, so function is void.
+  * Windows thread result comes via function return.
+  */
+ #ifndef WIN32
+ #define parallel_restore_result int
+ #else
+ #define parallel_restore_result DWORD
+ #endif
+ 
+ parallel_restore_result parallel_restore(ParallelArgs *args);
+ 
+ ParallelState ParallelBackupStart(ArchiveHandle *AH, int numWorker, RestoreOptions *ropt);
+ void ParallelBackupEnd(ArchiveHandle *AH, ParallelState *pstate);
+ void DispatchJobForTocEntry(ArchiveHandle *AH, ParallelState *pstate, TocEntry *te, T_Action act);
+ void WaitForAllChildren(ArchiveHandle *AH, ParallelState *pstate);
+ 
+ 
+ 
  /* Used everywhere */
  extern const char *progname;
  
*************** extern void ReadHead(ArchiveHandle *AH);
*** 364,369 ****
--- 448,457 ----
  extern void WriteToc(ArchiveHandle *AH);
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
+ extern void WriteDataChunksForTocEntry(ArchiveHandle *AH, TocEntry *te);
+ 
+ extern ArchiveHandle *CloneArchive(ArchiveHandle *AH);
+ extern void DeCloneArchive(ArchiveHandle *AH);
  
  extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
*************** extern void InitArchiveFmt_Files(Archive
*** 397,402 ****
--- 485,492 ----
  extern void InitArchiveFmt_Null(ArchiveHandle *AH);
  extern void InitArchiveFmt_Tar(ArchiveHandle *AH);
  
+ extern void setupArchDirectory(ArchiveHandle *AH, int numWorkers);
+ 
  extern bool isValidTarHeader(char *header);
  
  extern int	ReconnectToServer(ArchiveHandle *AH, const char *dbname, const char *newUser);
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index ccc9acb..57aae6d 100644
*** a/src/bin/pg_dump/pg_backup_custom.c
--- b/src/bin/pg_dump/pg_backup_custom.c
*************** static void _DeClone(ArchiveHandle *AH);
*** 62,67 ****
--- 62,73 ----
  static size_t _CustomWriteFunc(ArchiveHandle *AH, const void *buf, size_t len);
  static size_t _CustomReadFunction(ArchiveHandle *AH, void **buf, size_t sizeHint);
  
+ static char *_StartMasterParallel(ArchiveHandle *AH, TocEntry *te, T_Action act);
+ static int _EndMasterParallel(ArchiveHandle *AH, TocEntry *te, const char *str, T_Action act);
+ 
+ char *_WorkerJobRestoreCustom(ArchiveHandle *AH, const char *args);
+ 
+ 
  typedef struct
  {
  	CompressorState *cs;
*************** InitArchiveFmt_Custom(ArchiveHandle *AH)
*** 124,135 ****
  	AH->PrintExtraTocSummaryPtr = NULL;
  
  	AH->StartBlobsPtr = _StartBlobs;
  	AH->StartBlobPtr = _StartBlob;
  	AH->EndBlobPtr = _EndBlob;
! 	AH->EndBlobsPtr = _EndBlobs;
  	AH->ClonePtr = _Clone;
  	AH->DeClonePtr = _DeClone;
  
  	AH->StartCheckArchivePtr = NULL;
  	AH->CheckTocEntryPtr = NULL;
  	AH->EndCheckArchivePtr = NULL;
--- 130,150 ----
  	AH->PrintExtraTocSummaryPtr = NULL;
  
  	AH->StartBlobsPtr = _StartBlobs;
+ 	AH->EndBlobsPtr = _EndBlobs;
  	AH->StartBlobPtr = _StartBlob;
  	AH->EndBlobPtr = _EndBlob;
! 
  	AH->ClonePtr = _Clone;
  	AH->DeClonePtr = _DeClone;
  
+ 	AH->StartMasterParallelPtr = _StartMasterParallel;
+ 	AH->EndMasterParallelPtr = _EndMasterParallel;
+ 
+ 	AH->GetParallelStatePtr = NULL;
+ 	/* no parallel dump in the custom archive */
+ 	AH->WorkerJobDumpPtr = NULL;
+ 	AH->WorkerJobRestorePtr = _WorkerJobRestoreCustom;
+ 
  	AH->StartCheckArchivePtr = NULL;
  	AH->CheckTocEntryPtr = NULL;
  	AH->EndCheckArchivePtr = NULL;
*************** _DeClone(ArchiveHandle *AH)
*** 960,962 ****
--- 975,1049 ----
  	free(ctx);
  }
  
+ char *
+ _WorkerJobRestoreCustom(ArchiveHandle *AH, const char *args)
+ {
+ 	static char		buf[64]; /* short string + some ID so far */
+ 	ParallelArgs	pargs;
+ 	int				ret;
+ 	lclTocEntry	   *tctx;
+ 	TocEntry	   *te;
+ 	DumpId			dumpId = InvalidDumpId;
+ 	int				nBytes, nTok;
+ 
+ 	nTok = sscanf(args, "%d%n", &dumpId, &nBytes);
+ 	Assert(nBytes == strlen(args));
+ 	Assert(nTok == 1);
+ 
+ 	for (te = AH->toc->next; te != AH->toc; te = te->next)
+ 		if (te->dumpId == dumpId)
+ 			break;
+ 
+ 	Assert(dumpId != InvalidDumpId);
+ 
+ 	tctx = (lclTocEntry *) te->formatData;
+ 
+ 	pargs.AH = AH;
+ 	pargs.te = te;
+ 
+ 	/* parallel_restore() will reconnect and establish the restore
+ 	 * connection */
+ 	//AH->connection = NULL;
+ 
+ 	ret = parallel_restore(&pargs);
+ 
+ 	tctx->restore_status = ret;
+ 
+ 	/* XXX handle failure */
+ 	snprintf(buf, sizeof(buf), "OK RESTORE %d", te->dumpId);
+ 
+ 	return buf;
+ }
+ 
+ static char *
+ _StartMasterParallel(ArchiveHandle *AH, TocEntry *te, T_Action act)
+ {
+ 	static char			buf[32]; /* short string + number */
+ 
+ 	/* no parallel dump in the custom archive */
+ 	Assert(act == ACT_RESTORE);
+ 
+ 	snprintf(buf, sizeof(buf), "RESTORE %d", te->dumpId);
+ 
+ 	return buf;
+ }
+ 
+ static int
+ _EndMasterParallel(ArchiveHandle *AH, TocEntry *te, const char *str, T_Action act)
+ {
+ 	DumpId				dumpId;
+ 	int					nBytes;
+ 	int					nTok;
+ 
+ 	/* no parallel dump in the custom archive */
+ 	Assert(act == ACT_RESTORE);
+ 
+ 	nTok = sscanf(str, "%u%n", &dumpId, &nBytes);
+ 
+ 	Assert(nBytes == strlen(str));
+ 	Assert(nTok == 1);
+ 	Assert(dumpId == te->dumpId);
+ 
+ 	return 0;
+ }
+ 
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 1da57b3..b0676b4 100644
*** a/src/bin/pg_dump/pg_backup_directory.c
--- b/src/bin/pg_dump/pg_backup_directory.c
*************** static int	_ReadByte(ArchiveHandle *);
*** 50,55 ****
--- 50,56 ----
  static size_t _WriteBuf(ArchiveHandle *AH, const void *buf, size_t len);
  static size_t _ReadBuf(ArchiveHandle *AH, void *buf, size_t len);
  static void _CloseArchive(ArchiveHandle *AH);
+ static void _ReopenArchive(ArchiveHandle *AH);
  static void _PrintTocData(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  
  static void _WriteExtraToc(ArchiveHandle *AH, TocEntry *te);
*************** static void _StartBlob(ArchiveHandle *AH
*** 68,77 ****
--- 69,90 ----
  static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
  static void _EndBlobs(ArchiveHandle *AH, TocEntry *te);
  static void _LoadBlobs(ArchiveHandle *AH, RestoreOptions *ropt);
+ static void _Clone(ArchiveHandle *AH);
+ static void _DeClone(ArchiveHandle *AH);
  
+ /* XXX Name consistently. Archiveformat at the beginning or end of the name */
  static size_t _DirectoryReadFunction(ArchiveHandle *AH, void **buf, size_t sizeHint);
+ static char *_StartMasterParallel(ArchiveHandle *AH, TocEntry *te, T_Action act);
+ static int _EndMasterParallel(ArchiveHandle *AH, TocEntry *te, const char *str, T_Action act);
+ 
+ static ParallelState *_GetParallelState(ArchiveHandle *AH);
+ 
+ /* XXX order */
+ static char *_WorkerJobRestoreDirectory(ArchiveHandle *AH, const char *args);
+ static char *_WorkerJobDumpDirectory(ArchiveHandle *AH, const char *args);
  
  static bool _StartCheckArchive(ArchiveHandle *AH);
+ static bool _CheckDirectory(ArchiveHandle *AH, const char *dname, bool *tocSeen);
  static bool _CheckTocEntry(ArchiveHandle *AH, TocEntry *te, teReqs reqs);
  static bool _CheckFileContents(ArchiveHandle *AH, const char *fname, const char* idStr, bool terminateOnError);
  static bool _CheckFileSize(ArchiveHandle *AH, const char *fname, pgoff_t pgSize, bool terminateOnError);
*************** static bool _CheckBlob(ArchiveHandle *AH
*** 79,95 ****
  static bool _CheckBlobs(ArchiveHandle *AH, TocEntry *te, teReqs reqs);
  static bool _EndCheckArchive(ArchiveHandle *AH);
  
! static char *prependDirectory(ArchiveHandle *AH, const char *relativeFilename);
! static char *prependBlobsDirectory(ArchiveHandle *AH, Oid oid);
! static void createDirectory(const char *dir, const char *subdir);
  
  static char *getRandomData(char *s, int len);
  
  static void _StartDataCompressor(ArchiveHandle *AH, TocEntry *te);
  static void _EndDataCompressor(ArchiveHandle *AH, TocEntry *te);
  
! static bool isDirectory(const char *fname);
! static bool isRegularFile(const char *fname);
  
  #define K_STD_BUF_SIZE	1024
  #define FILE_SUFFIX		".dat"
--- 92,108 ----
  static bool _CheckBlobs(ArchiveHandle *AH, TocEntry *te, teReqs reqs);
  static bool _EndCheckArchive(ArchiveHandle *AH);
  
! static char *prependDirectory(ArchiveHandle *AH, const char *relativeFilename, int directoryIndex);
! static char *prependBlobsDirectory(ArchiveHandle *AH, Oid oid, int directoryIndex);
! static void createDirectoryGroup(char **dirs, int nDir, const char *subdir);
  
  static char *getRandomData(char *s, int len);
  
  static void _StartDataCompressor(ArchiveHandle *AH, TocEntry *te);
  static void _EndDataCompressor(ArchiveHandle *AH, TocEntry *te);
  
! static bool isDirectory(const char *dname, const char *fname);
! static bool isRegularFile(const char *dname, const char *fname);
  
  #define K_STD_BUF_SIZE	1024
  #define FILE_SUFFIX		".dat"
*************** typedef struct _lclContext
*** 98,106 ****
  {
  	/*
  	 * Our archive location. This is basically what the user specified as his
! 	 * backup file but of course here it is a directory.
  	 */
! 	char			   *directory;
  
  	/*
  	 * As a directory archive contains of several files we want to make sure
--- 111,120 ----
  {
  	/*
  	 * Our archive location. This is basically what the user specified as his
! 	 * backup file but of course here it is one or several director(y|ies).
  	 */
! 	char		  **directories;
! 	int				numDirectories;
  
  	/*
  	 * As a directory archive contains of several files we want to make sure
*************** typedef struct _lclContext
*** 145,150 ****
--- 159,170 ----
  	DumpId			   *chkList;
  	int					chkListSize;
  
+ 	/* this is for a parallel backup or restore */
+ 	int			   *directoryUsage;			/* only used in the master */
+ 	ParallelState	pstate;
+ 	int				numWorkers;
+ 	bool			is_parallel_child;
+ 
  	CompressorState	   *cs;
  } lclContext;
  
*************** typedef struct
*** 152,159 ****
--- 172,187 ----
  {
  	char	   *filename;		/* filename excluding the directory (basename) */
  	pgoff_t		fileSize;
+ 	int			restore_status;
+ 	int			directoryIndex;
  } lclTocEntry;
  
+ static void splitDirectories(const char *spec, lclContext *ctx);
+ static int assignDirectory(lclContext *ctx);
+ static void unassignDirectory(lclContext *ctx, lclTocEntry *tctx);
+ 
+ 
+ 
  typedef struct _lclFileHeader
  {
  	int			version;
*************** InitArchiveFmt_Directory(ArchiveHandle *
*** 188,194 ****
  	AH->WriteBufPtr = _WriteBuf;
  	AH->ReadBufPtr = _ReadBuf;
  	AH->ClosePtr = _CloseArchive;
! 	AH->ReopenPtr = NULL;
  	AH->PrintTocDataPtr = _PrintTocData;
  	AH->ReadExtraTocPtr = _ReadExtraToc;
  	AH->WriteExtraTocPtr = _WriteExtraToc;
--- 216,222 ----
  	AH->WriteBufPtr = _WriteBuf;
  	AH->ReadBufPtr = _ReadBuf;
  	AH->ClosePtr = _CloseArchive;
! 	AH->ReopenPtr = _ReopenArchive;
  	AH->PrintTocDataPtr = _PrintTocData;
  	AH->ReadExtraTocPtr = _ReadExtraToc;
  	AH->WriteExtraTocPtr = _WriteExtraToc;
*************** InitArchiveFmt_Directory(ArchiveHandle *
*** 200,207 ****
  	AH->EndBlobPtr = _EndBlob;
  	AH->EndBlobsPtr = _EndBlobs;
  
! 	AH->ClonePtr = NULL;
! 	AH->DeClonePtr = NULL;
  
  	AH->StartCheckArchivePtr = _StartCheckArchive;
  	AH->CheckTocEntryPtr = _CheckTocEntry;
--- 228,242 ----
  	AH->EndBlobPtr = _EndBlob;
  	AH->EndBlobsPtr = _EndBlobs;
  
! 	AH->ClonePtr = _Clone;
! 	AH->DeClonePtr = _DeClone;
! 
! 	AH->GetParallelStatePtr = _GetParallelState;
! 	AH->WorkerJobRestorePtr = _WorkerJobRestoreDirectory;
! 	AH->WorkerJobDumpPtr = _WorkerJobDumpDirectory;
! 
! 	AH->StartMasterParallelPtr = _StartMasterParallel;
! 	AH->EndMasterParallelPtr = _EndMasterParallel;
  
  	AH->StartCheckArchivePtr = _StartCheckArchive;
  	AH->CheckTocEntryPtr = _CheckTocEntry;
*************** InitArchiveFmt_Directory(ArchiveHandle *
*** 225,230 ****
--- 260,270 ----
  	if (AH->lo_buf == NULL)
  		die_horribly(AH, modulename, "out of memory\n");
  
+ 	ctx->directories = NULL;
+ 	ctx->numDirectories = 0;
+ 	ctx->directoryUsage = NULL;
+ 	ctx->is_parallel_child = false;
+ 
  	/*
  	 * Now open the TOC file
  	 */
*************** InitArchiveFmt_Directory(ArchiveHandle *
*** 232,242 ****
  	if (!AH->fSpec || strcmp(AH->fSpec, "") == 0)
  		die_horribly(AH, modulename, "no directory specified\n");
  
! 	ctx->directory = AH->fSpec;
  
  	if (AH->mode == archModeWrite)
  	{
! 		char   *fname = prependDirectory(AH, "TOC");
  		char   buf[256];
  
  		/*
--- 272,283 ----
  	if (!AH->fSpec || strcmp(AH->fSpec, "") == 0)
  		die_horribly(AH, modulename, "no directory specified\n");
  
! 	/* Create the directory/directories */
! 	splitDirectories(AH->fSpec, ctx);
  
  	if (AH->mode == archModeWrite)
  	{
! 		char   *fname = prependDirectory(AH, "TOC", 0);
  		char   buf[256];
  
  		/*
*************** InitArchiveFmt_Directory(ArchiveHandle *
*** 245,254 ****
  		 */
  		getRandomData(buf, sizeof(buf));
  		if (!pg_md5_hash(buf, strlen(buf), ctx->idStr))
! 			die_horribly(AH, modulename, "Error computing checksum");
  
! 		/* Create the directory, errors are caught there */
! 		createDirectory(ctx->directory, NULL);
  
  		ctx->cs = AllocateCompressorState(AH);
  
--- 286,295 ----
  		 */
  		getRandomData(buf, sizeof(buf));
  		if (!pg_md5_hash(buf, strlen(buf), ctx->idStr))
! 			die_horribly(AH, modulename, "Error computing checksum\n");
  
! 		/* Create the directories, errors are caught there */
! 		createDirectoryGroup(ctx->directories, ctx->numDirectories, NULL);
  
  		ctx->cs = AllocateCompressorState(AH);
  
*************** InitArchiveFmt_Directory(ArchiveHandle *
*** 260,267 ****
  	else
  	{							/* Read Mode */
  		char	   *fname;
  
! 		fname = prependDirectory(AH, "TOC");
  
  		AH->FH = fopen(fname, PG_BINARY_R);
  		if (AH->FH == NULL)
--- 301,324 ----
  	else
  	{							/* Read Mode */
  		char	   *fname;
+ 		int			i;
+ 		struct stat	st;
  
! 		/* check the directories. As we are in read mode, they need to exist */
! 		for (i = 0; i < ctx->numDirectories; i++)
! 		{
! 			if (stat(ctx->directories[i], &st) != 0)
! 				die_horribly(NULL, modulename,
! 							 "invalid input directory specified, cannot stat \"%s\": %s\n",
! 							 ctx->directories[i], strerror(errno));
! 
! 			if (!S_ISDIR(st.st_mode))
! 				die_horribly(NULL, modulename,
! 							 "invalid input directory specified, \"%s\" is not a directory\n",
! 							 ctx->directories[i]);
! 		}
! 
! 		fname = prependDirectory(AH, "TOC", -1);
  
  		AH->FH = fopen(fname, PG_BINARY_R);
  		if (AH->FH == NULL)
*************** _StartData(ArchiveHandle *AH, TocEntry *
*** 423,429 ****
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	char		   *fname;
  
! 	fname = prependDirectory(AH, tctx->filename);
  
  	ctx->dataFH = (FILE *) fopen(fname, PG_BINARY_W);
  	if (ctx->dataFH == NULL)
--- 480,500 ----
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	char		   *fname;
  
! 	/*
! 	 * If we are running in parallel mode, the master controls the directory
! 	 * usage. Then the directory is already assigned.
! 	 * If not (i.e. we are running with only one process), we assign it from
! 	 * here.
! 	 */
! 	if (ctx->is_parallel_child)
! 	{
! 		Assert(ctx->directoryUsage == NULL);
! 		Assert(tctx->directoryIndex >= 0);
! 	}
! 	else
! 		tctx->directoryIndex = assignDirectory(ctx);
! 
! 	fname = prependDirectory(AH, tctx->filename, tctx->directoryIndex);
  
  	ctx->dataFH = (FILE *) fopen(fname, PG_BINARY_W);
  	if (ctx->dataFH == NULL)
*************** _EndData(ArchiveHandle *AH, TocEntry *te
*** 544,549 ****
--- 615,622 ----
  	tctx->fileSize = ctx->dataFilePos;
  
  	ctx->dataFH = NULL;
+ 	if (!ctx->is_parallel_child)
+ 		unassignDirectory(ctx, tctx);
  }
  
  /*
*************** _PrintTocData(ArchiveHandle *AH, TocEntr
*** 596,602 ****
  		_LoadBlobs(AH, ropt);
  	else
  	{
! 		char   *fname = prependDirectory(AH, tctx->filename);
  		_PrintFileData(AH, fname, tctx->fileSize, ropt);
  	}
  }
--- 669,675 ----
  		_LoadBlobs(AH, ropt);
  	else
  	{
! 		char   *fname = prependDirectory(AH, tctx->filename, -1);
  		_PrintFileData(AH, fname, tctx->fileSize, ropt);
  	}
  }
*************** _LoadBlobs(ArchiveHandle *AH, RestoreOpt
*** 611,618 ****
  
  	StartRestoreBlobs(AH);
  
! 	fname = prependDirectory(AH, "BLOBS.TOC");
! 
  	ctx->blobsTocFH = fopen(fname, "rb");
  
  	if (ctx->blobsTocFH == NULL)
--- 684,690 ----
  
  	StartRestoreBlobs(AH);
  
! 	fname = prependDirectory(AH, "BLOBS.TOC", -1);
  	ctx->blobsTocFH = fopen(fname, "rb");
  
  	if (ctx->blobsTocFH == NULL)
*************** _LoadBlobs(ArchiveHandle *AH, RestoreOpt
*** 635,641 ****
  		ReadOffset(AH, &blobSize);
  
  		StartRestoreBlob(AH, oid, ropt->dropSchema);
! 		blobFname = prependBlobsDirectory(AH, oid);
  		_PrintFileData(AH, blobFname, blobSize, ropt);
  		EndRestoreBlob(AH, oid);
  	}
--- 707,713 ----
  		ReadOffset(AH, &blobSize);
  
  		StartRestoreBlob(AH, oid, ropt->dropSchema);
! 		blobFname = prependBlobsDirectory(AH, oid, -1);
  		_PrintFileData(AH, blobFname, blobSize, ropt);
  		EndRestoreBlob(AH, oid);
  	}
*************** _CloseArchive(ArchiveHandle *AH)
*** 813,836 ****
  {
  	if (AH->mode == archModeWrite)
  	{
- #ifdef USE_ASSERT_CHECKING
  		lclContext	   *ctx = (lclContext *) AH->formatData;
! #endif
  
  		WriteDataChunks(AH);
  
  		Assert(TOC_FH_ACTIVE);
- 
  		WriteHead(AH);
  		_WriteExtraHead(AH);
  		WriteToc(AH);
  
  		if (fclose(AH->FH) != 0)
  			die_horribly(AH, modulename, "could not close TOC file: %s\n", strerror(errno));
  	}
  	AH->FH = NULL;
  }
  
  
  
  /*
--- 885,921 ----
  {
  	if (AH->mode == archModeWrite)
  	{
  		lclContext	   *ctx = (lclContext *) AH->formatData;
! 
! 		/* this will actually fork the processes */
! 		ctx->pstate = ParallelBackupStart(AH, ctx->numWorkers, NULL);
  
  		WriteDataChunks(AH);
  
  		Assert(TOC_FH_ACTIVE);
  		WriteHead(AH);
  		_WriteExtraHead(AH);
  		WriteToc(AH);
  
+ 		ParallelBackupEnd(AH, &ctx->pstate);
+ 
  		if (fclose(AH->FH) != 0)
  			die_horribly(AH, modulename, "could not close TOC file: %s\n", strerror(errno));
  	}
  	AH->FH = NULL;
  }
  
+ /*
+  * Reopen the archive's file handle.
+  */
+ static void
+ _ReopenArchive(ArchiveHandle *AH)
+ {
+ 	/*
+ 	 * our TOC is in memory, our data files are opened by each child anyway as
+ 	 * they are separate. We support reopening the archive by just doing nothing
+ 	 */
+ }
  
  
  /*
*************** _CloseArchive(ArchiveHandle *AH)
*** 849,859 ****
  static void
  _StartBlobs(ArchiveHandle *AH, TocEntry *te)
  {
! 	lclContext	   *ctx = (lclContext *) AH->formatData;
! 	char		   *fname;
  
! 	fname = prependDirectory(AH, "BLOBS.TOC");
! 	createDirectory(ctx->directory, "blobs");
  
  	ctx->blobsTocFH = fopen(fname, "ab");
  	if (ctx->blobsTocFH == NULL)
--- 934,950 ----
  static void
  _StartBlobs(ArchiveHandle *AH, TocEntry *te)
  {
! 	lclContext  *ctx = (lclContext *) AH->formatData;
! 	lclTocEntry *tctx = (lclTocEntry *) te->formatData;
! 	char	    *fname;
  
! 	/* XXX see comment in StartData */
! 	if (!ctx->is_parallel_child)
! 		tctx->directoryIndex = assignDirectory(ctx);
! 
! 	fname = prependDirectory(AH, "BLOBS.TOC", 0);
! 	/* XXX could also create only one blobs dir */
! 	createDirectoryGroup(ctx->directories, ctx->numDirectories, "blobs");
  
  	ctx->blobsTocFH = fopen(fname, "ab");
  	if (ctx->blobsTocFH == NULL)
*************** static void
*** 878,886 ****
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	char		   *fname;
  
! 	fname = prependBlobsDirectory(AH, oid);
  	ctx->dataFH = (FILE *) fopen(fname, PG_BINARY_W);
  
  	if (ctx->dataFH == NULL)
--- 969,978 ----
  _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
  {
  	lclContext	   *ctx = (lclContext *) AH->formatData;
+ 	lclTocEntry	   *tctx = (lclTocEntry *) te->formatData;
  	char		   *fname;
  
! 	fname = prependBlobsDirectory(AH, oid, tctx->directoryIndex);
  	ctx->dataFH = (FILE *) fopen(fname, PG_BINARY_W);
  
  	if (ctx->dataFH == NULL)
*************** _EndBlobs(ArchiveHandle *AH, TocEntry *t
*** 943,948 ****
--- 1035,1043 ----
  	ctx->blobsTocFH = NULL;
  
  	tctx->fileSize = ctx->blobsTocFilePos;
+ 
+ 	if (!ctx->is_parallel_child)
+ 		unassignDirectory(ctx, tctx);
  }
  
  /*
*************** _StartCheckArchive(ArchiveHandle *AH)
*** 965,976 ****
  {
  	bool			checkOK = true;
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	DIR			   *dir;
- 	char		   *dname = ctx->directory;
  	struct dirent  *entry;
  	int				idx = 0;
  	char		   *suffix;
- 	bool			tocSeen = false;
  
  	dir = opendir(dname);
  	if (!dir)
--- 1060,1090 ----
  {
  	bool			checkOK = true;
  	lclContext	   *ctx = (lclContext *) AH->formatData;
+ 	int				i;
+ 	bool			tocSeen = false;
+ 
+ 	for (i = 0; i < ctx->numDirectories; i++)
+ 	{
+ 		Assert(ctx->directories[i] != NULL);
+ 		checkOK |= _CheckDirectory(AH, ctx->directories[i], &tocSeen);
+ 	}
+ 
+ 	if (!tocSeen)
+ 		printf("Could not locate the TOC file of the archive\n");
+ 
+ 	/* also return false if we haven't seen the TOC file */
+ 	return checkOK && tocSeen;
+ }
+ 
+ static bool
+ _CheckDirectory(ArchiveHandle *AH, const char *dname, bool *tocSeen)
+ {
+ 	bool			checkOK = true;
+ 	lclContext	   *ctx = (lclContext *) AH->formatData;
  	DIR			   *dir;
  	struct dirent  *entry;
  	int				idx = 0;
  	char		   *suffix;
  
  	dir = opendir(dname);
  	if (!dir)
*************** _StartCheckArchive(ArchiveHandle *AH)
*** 1018,1033 ****
  
  		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
  			continue;
! 		if (strcmp(entry->d_name, "blobs") == 0 &&
! 						isDirectory(prependDirectory(AH, entry->d_name)))
  			continue;
! 		if (strcmp(entry->d_name, "BLOBS.TOC") == 0 &&
! 						isRegularFile(prependDirectory(AH, entry->d_name)))
  			continue;
! 		if (strcmp(entry->d_name, "TOC") == 0 &&
! 						isRegularFile(prependDirectory(AH, entry->d_name)))
  		{
! 			tocSeen = true;
  			continue;
  		}
  		/* besides the above we only expect nnnn.dat, with nnnn being our numerical dumpID */
--- 1132,1145 ----
  
  		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
  			continue;
! 		/* unfortunately Solaris doesn't have entry->d_type, so we can't use that */
! 		if (strcmp(entry->d_name, "blobs") == 0 && isDirectory(dname, entry->d_name))
  			continue;
! 		if (strcmp(entry->d_name, "BLOBS.TOC") == 0 && isRegularFile(dname, entry->d_name))
  			continue;
! 		if (strcmp(entry->d_name, "TOC") == 0 && isRegularFile(dname, entry->d_name))
  		{
! 			*tocSeen = true;
  			continue;
  		}
  		/* besides the above we only expect nnnn.dat, with nnnn being our numerical dumpID */
*************** _StartCheckArchive(ArchiveHandle *AH)
*** 1075,1082 ****
  	while (idx < ctx->chkListSize)
  		ctx->chkList[idx++] = InvalidDumpId;
  
! 	/* also return false if we haven't seen the TOC file */
! 	return checkOK && tocSeen;
  }
  
  static bool
--- 1187,1193 ----
  	while (idx < ctx->chkListSize)
  		ctx->chkList[idx++] = InvalidDumpId;
  
! 	return checkOK;
  }
  
  static bool
*************** static bool
*** 1188,1194 ****
  _CheckBlob(ArchiveHandle *AH, Oid oid, pgoff_t size)
  {
  	lclContext	   *ctx = (lclContext *) AH->formatData;
! 	char		   *fname = prependBlobsDirectory(AH, oid);
  	bool			checkOK = true;
  
  	if (!_CheckFileSize(AH, fname, size, false))
--- 1299,1305 ----
  _CheckBlob(ArchiveHandle *AH, Oid oid, pgoff_t size)
  {
  	lclContext	   *ctx = (lclContext *) AH->formatData;
! 	char		   *fname = prependBlobsDirectory(AH, oid, -1);
  	bool			checkOK = true;
  
  	if (!_CheckFileSize(AH, fname, size, false))
*************** _CheckBlobs(ArchiveHandle *AH, TocEntry 
*** 1211,1223 ****
  	Oid				oid;
  
  	/* check the BLOBS.TOC first */
! 	fname = prependDirectory(AH, "BLOBS.TOC");
! 
! 	if (!fname)
! 	{
! 		printf("Could not find BLOBS.TOC. Check the archive!\n");
! 		return false;
! 	}
  
  	if (!_CheckFileSize(AH, fname, tctx->fileSize, false))
  		checkOK = false;
--- 1322,1328 ----
  	Oid				oid;
  
  	/* check the BLOBS.TOC first */
! 	fname = prependDirectory(AH, "BLOBS.TOC", -1);
  
  	if (!_CheckFileSize(AH, fname, tctx->fileSize, false))
  		checkOK = false;
*************** _CheckTocEntry(ArchiveHandle *AH, TocEnt
*** 1291,1303 ****
  	{
  		char		   *fname;
  
! 		fname = prependDirectory(AH, tctx->filename);
! 		if (!fname)
! 		{
! 			printf("Could not find file %s\n", tctx->filename);
! 			checkOK = false;
! 		}
! 		else if (!_CheckFileSize(AH, fname, tctx->fileSize, false))
  			checkOK = false;
  		else if (!_CheckFileContents(AH, fname, ctx->idStr, false))
  			checkOK = false;
--- 1396,1403 ----
  	{
  		char		   *fname;
  
! 		fname = prependDirectory(AH, tctx->filename, -1);
! 		if (!_CheckFileSize(AH, fname, tctx->fileSize, false))
  			checkOK = false;
  		else if (!_CheckFileContents(AH, fname, ctx->idStr, false))
  			checkOK = false;
*************** _EndCheckArchive(ArchiveHandle *AH)
*** 1326,1384 ****
  	return checkOK;
  }
  
- 
- static void
- createDirectory(const char *dir, const char *subdir)
- {
- 	struct stat		st;
- 	char			dirname[MAXPGPATH];
- 
- 	/* the directory must not yet exist, first check if it is existing */
- 	if (subdir && strlen(dir) + 1 + strlen(subdir) + 1 > MAXPGPATH)
- 		die_horribly(NULL, modulename, "directory name %s too long", dir);
- 
- 	strcpy(dirname, dir);
- 
- 	if (subdir)
- 	{
- 		strcat(dirname, "/");
- 		strcat(dirname, subdir);
- 	}
- 
- 	if (stat(dirname, &st) == 0)
- 	{
- 		if (S_ISDIR(st.st_mode))
- 			die_horribly(NULL, modulename,
- 						 "Cannot create directory %s, it exists already\n", dirname);
- 		else
- 			die_horribly(NULL, modulename,
- 						 "Cannot create directory %s, a file with this name exists already\n", dirname);
- 	}
- 
- 	/*
- 	 * Now we create the directory. Note that for some race condition we
- 	 * could also run into the situation that the directory has been created
- 	 * just between our two calls.
- 	 */
- 	if (mkdir(dirname, 0700) < 0)
- 		die_horribly(NULL, modulename, "Could not create directory %s: %s",
- 					 dirname, strerror(errno));
- }
- 
- 
  static char *
! prependDirectory(ArchiveHandle *AH, const char *relativeFilename)
  {
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	static char		buf[MAXPGPATH];
! 	char		   *dname;
  
! 	dname = ctx->directory;
  
! 	if (strlen(dname) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)
! 			die_horribly(AH, modulename, "path name too long: %s", dname);
  
! 	strcpy(buf, dname);
  	strcat(buf, "/");
  	strcat(buf, relativeFilename);
  
--- 1426,1455 ----
  	return checkOK;
  }
  
  static char *
! prependDirectory(ArchiveHandle *AH, const char *relativeFilename, int directoryIndex)
  {
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	static char		buf[MAXPGPATH];
! 	int				i;
  
! 	if (directoryIndex < 0)
! 	{
! 		/* detect the directory automatically (calls itself) */
! 		for (i = 0; i < ctx->numDirectories; i++)
! 		{
! 			struct stat	st;
! 			char	   *fname = prependDirectory(AH, relativeFilename, i);
! 			if (stat(fname, &st) == 0 && S_ISREG(st.st_mode))
! 				return fname;
! 		}
! 		die_horribly(AH, modulename, "Could not find input file \"%s\" in the archive\n", relativeFilename);
! 	}
  
! 	if (strlen(ctx->directories[directoryIndex]) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)
! 			die_horribly(AH, modulename, "directory name \"%s\" too long\n", ctx->directories[directoryIndex]);
  
! 	strcpy(buf, ctx->directories[directoryIndex]);
  	strcat(buf, "/");
  	strcat(buf, relativeFilename);
  
*************** prependDirectory(ArchiveHandle *AH, cons
*** 1386,1405 ****
  }
  
  static char *
! prependBlobsDirectory(ArchiveHandle *AH, Oid oid)
  {
  	static char		buf[MAXPGPATH];
  	char		   *dname;
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	int				r;
  
! 	dname = ctx->directory;
  
  	r = snprintf(buf, MAXPGPATH, "%s/blobs/%d%s",
  				 dname, oid, FILE_SUFFIX);
  
  	if (r < 0 || r >= MAXPGPATH)
! 		die_horribly(AH, modulename, "path name too long: %s", dname);
  
  	return buf;
  }
--- 1457,1491 ----
  }
  
  static char *
! prependBlobsDirectory(ArchiveHandle *AH, Oid oid, int directoryIndex)
  {
  	static char		buf[MAXPGPATH];
  	char		   *dname;
  	lclContext	   *ctx = (lclContext *) AH->formatData;
  	int				r;
  
! 	if (directoryIndex < 0)
! 	{
! 		int i;
! 
! 		for (i = 0; i < ctx->numDirectories; i++)
! 		{
! 			struct stat	st;
! 			char	   *fname = prependBlobsDirectory(AH, oid, i);
! 			if (stat(fname, &st) == 0 && S_ISREG(st.st_mode))
! 				return fname;
! 		}
! 		die_horribly(AH, modulename, "Could not find input file \"%d%s\" in the archive\n",
! 					 oid, FILE_SUFFIX);
! 	}
! 
! 	dname = ctx->directories[directoryIndex];
  
  	r = snprintf(buf, MAXPGPATH, "%s/blobs/%d%s",
  				 dname, oid, FILE_SUFFIX);
  
  	if (r < 0 || r >= MAXPGPATH)
! 		die_horribly(AH, modulename, "directory name \"%s\" too long\n", dname);
  
  	return buf;
  }
*************** getRandomData(char *s, int len)
*** 1473,1496 ****
  }
  
  static bool
! isDirectory(const char *fname)
  {
! 	struct stat st;
  
! 	if (stat(fname, &st))
  		return false;
  
  	return S_ISDIR(st.st_mode);
  }
  
  static bool
! isRegularFile(const char *fname)
  {
! 	struct stat st;
  
! 	if (stat(fname, &st))
  		return false;
  
  	return S_ISREG(st.st_mode);
  }
  
--- 1559,1922 ----
  }
  
  static bool
! isDirectory(const char *dname, const char *fname)
  {
! 	char		buf[MAXPGPATH];
! 	struct stat	st;
  
! 	if (strlen(dname) + 1 + strlen(fname) + 1 > sizeof(buf))
! 		die_horribly(NULL, modulename, "directory name \"%s\" too long\n", dname);
! 
! 	strcpy(buf, dname);
! 	strcat(buf, "/");
! 	strcat(buf, fname);
! 
! 	if (stat(buf, &st))
  		return false;
  
  	return S_ISDIR(st.st_mode);
  }
  
  static bool
! isRegularFile(const char *dname, const char *fname)
  {
! 	char		buf[MAXPGPATH];
! 	struct stat	st;
  
! 	if (strlen(dname) + 1 + strlen(fname) + 1 > sizeof(buf))
! 		die_horribly(NULL, modulename, "directory name \"%s\" too long\n", dname);
! 
! 	strcpy(buf, dname);
! 	strcat(buf, "/");
! 	strcat(buf, fname);
! 
! 	if (stat(buf, &st))
  		return false;
  
  	return S_ISREG(st.st_mode);
  }
  
+ static char *
+ _WorkerJobDumpDirectory(ArchiveHandle *AH, const char *args)
+ {
+ 	static char		buf[64]; /* short string + some ID so far */
+ 	lclContext	   *ctx = (lclContext *) AH->formatData;
+ 	TocEntry	   *te;
+ 	lclTocEntry	   *tctx = NULL;
+ 	DumpId			dumpId;
+ 	int				directoryIndex;
+ 	int				nBytes, nTok;
+ 
+ 	nTok = sscanf(args, "%d %d%n", &dumpId, &directoryIndex, &nBytes);
+ 	Assert(nBytes == strlen(args));
+ 	Assert(nTok == 2); /* XXX remove, not safe acc. to manpage */
+ 
+ 	ctx->is_parallel_child = true;
+ 	ctx->directoryUsage = NULL;
+ 
+ 	for (te = AH->toc->next; te != AH->toc; te = te->next)
+ 	{
+ 		if (te->dumpId == dumpId)
+ 		{
+ 			tctx = (lclTocEntry *) te->formatData;
+ 			tctx->directoryIndex = directoryIndex;
+ 			break;
+ 		}
+ 	}
+ 
+ 	Assert(te->dumpId == dumpId);
+ 	Assert(tctx != NULL);
+ 	/* This should never happen */
+ 	if (!tctx)
+ 		die_horribly(AH, modulename, "Error during backup\n");
+ 
+ 	WriteDataChunksForTocEntry(AH, te);
+ 
+ 	/* XXX handle failure */
+ 	snprintf(buf, sizeof(buf), "OK DUMP %d %lu", te->dumpId, (unsigned long int) tctx->fileSize);
+ 
+ 	return buf;
+ }
+ 
+ /*
+  * Clone format-specific fields during parallel restoration.
+  */
+ static void
+ _Clone(ArchiveHandle *AH)
+ {
+ 	lclContext *ctx = (lclContext *) AH->formatData;
+ 
+ 	AH->formatData = (lclContext *) malloc(sizeof(lclContext));
+ 	if (AH->formatData == NULL)
+ 		die_horribly(AH, modulename, "out of memory\n");
+ 	memcpy(AH->formatData, ctx, sizeof(lclContext));
+ 	ctx = (lclContext *) AH->formatData;
+ 
+ 	ctx->cs = AllocateCompressorState(AH);
+ 
+ 	/*
+ 	 * Note: we do not make a local lo_buf because we expect at most one BLOBS
+ 	 * entry per archive, so no parallelism is possible.  Likewise,
+ 	 * TOC-entry-local state isn't an issue because any one TOC entry is
+ 	 * touched by just one worker child.
+ 	 */
+ }
+ 
+ static void
+ _DeClone(ArchiveHandle *AH)
+ {
+ 	lclContext *ctx = (lclContext *) AH->formatData;
+ 	CompressorState	   *cs = ctx->cs;
+ 
+ 	FreeCompressorState(cs);
+ 
+ 	free(ctx);
+ }
+ 
+ /* XXX sort in to a better place */
+ static char *
+ _WorkerJobRestoreDirectory(ArchiveHandle *AH, const char *args)
+ {
+ 	static char		buf[64]; /* short string + some ID so far */
+ 	lclContext	   *ctx = (lclContext *) AH->formatData;
+ 	ParallelArgs	pargs;
+ 	int				ret;
+ 	lclTocEntry	   *tctx;
+ 	TocEntry	   *te;
+ 	DumpId			dumpId = InvalidDumpId;
+ 	int				nBytes, nTok;
+ 
+ 	nTok = sscanf(args, "%d%n", &dumpId, &nBytes);
+ 	Assert(nBytes == strlen(args));
+ 	Assert(nTok == 1); /* XXX remove, not safe acc. to manpage */
+ 
+ 	ctx->is_parallel_child = true;
+ 	ctx->directoryUsage = NULL;
+ 
+ 	for (te = AH->toc->next; te != AH->toc; te = te->next)
+ 	{
+ 		if (te->dumpId == dumpId)
+ 		{
+ 			tctx = (lclTocEntry *) te->formatData;
+ 			break;
+ 		}
+ 	}
+ 
+ 	Assert(te->dumpId == dumpId);
+ 	tctx = (lclTocEntry *) te->formatData;
+ 
+ 	pargs.AH = AH;
+ 	pargs.te = te;
+ 
+ 	/* parallel_restore() will reconnect and establish the restore
+ 	 * connection */
+ 	/* AH->connection = NULL; */
+ 	ctx->is_parallel_child = true;
+ 
+ 	ret = parallel_restore(&pargs);
+ 
+ 	tctx->restore_status = ret;
+ 
+ 	/* XXX handle failure */
+ 	snprintf(buf, sizeof(buf), "OK RESTORE %d", dumpId);
+ 
+ 	return buf;
+ }
+ 
+ static ParallelState *
+ _GetParallelState(ArchiveHandle *AH)
+ {
+ 	lclContext *ctx = (lclContext *) AH->formatData;
+ 	if (ctx->pstate.numWorkers > 1)
+ 		return &ctx->pstate;
+ 	else
+ 		return NULL;
+ }
+ 
+ /* XXX if numWorkers is the only piece of information that we pass to the
+  * format this way, consider generating a AH->number_of_jobs or the like. */
+ void
+ setupArchDirectory(ArchiveHandle *AH, int numWorkers)
+ {
+ 	lclContext	   *ctx = (lclContext *) AH->formatData;
+ 	ctx->numWorkers = numWorkers;
+ }
+ 
+ static void
+ splitDirectories(const char *spec, lclContext *ctx)
+ {
+ 	/* count the number of fragments */
+ 	char		   *p;
+ 	const char	   *q;
+ 
+ 	ctx->numDirectories = 1;
+ 	for (q = spec; *q != '\0'; q++)
+ 	{
+ 		if (*q == ':')
+ 			ctx->numDirectories++;
+ 	}
+ 
+ 	ctx->directories = (char **) malloc(ctx->numDirectories * sizeof(char *));
+ 	ctx->directoryUsage = (int *) malloc(ctx->numDirectories * sizeof(int));
+ 	p = strdup(spec);
+ 	if (!ctx->directories || !p)
+ 		die_horribly(NULL, modulename, "out of memory\n");
+ 
+ 	ctx->numDirectories = 1;
+ 	ctx->directories[0] = p;
+ 	ctx->directoryUsage[0] = 0;
+ 	for(; *p;)
+ 	{
+ 		if (*p == ':')
+ 		{
+ 			*p = '\0';
+ 			p++;
+ 			ctx->numDirectories++;
+ 			ctx->directories[ctx->numDirectories - 1] = p;
+ 			ctx->directoryUsage[ctx->numDirectories - 1] = 0;
+ 		}
+ 		else
+ 			p++;
+ 	}
+ }
+ 
+ static void
+ createDirectoryGroup(char **dirs, int nDir, const char *subdir)
+ {
+ 	/* the directories must not yet exist, first check if they are existing */
+ 	struct stat		st;
+ 	int				i;
+ 	char			dirname[MAXPGPATH];
+ 
+ 	for(i = 0; i < nDir; i++)
+ 	{
+ 		if (subdir && strlen(dirs[i]) + 1 + strlen(subdir) + 1 > MAXPGPATH)
+ 			die_horribly(NULL, modulename, "directory name \"%s\" too long\n", dirs[i]);
+ 		strcpy(dirname, dirs[i]);
+ 
+ 		if (subdir)
+ 		{
+ 			strcat(dirname, "/");
+ 			strcat(dirname, subdir);
+ 		}
+ 
+ 		/* XXX extend checks - check for base path */
+ 		if (stat(dirname, &st) != 0)
+ 			continue;
+ 		if (S_ISDIR(st.st_mode))
+ 			die_horribly(NULL, modulename, "Cannot create directory \"%s\", it exists already\n", dirname);
+ 		else
+ 			die_horribly(NULL, modulename, "Cannot create directory \"%s\", a file with this name exists already\n", dirname);
+ 	}
+ 
+ 	/* now create the directories. Still for insufficient privileges or some
+ 	 * race condition we could fail here */
+ 
+ 	for (i = 0; i < nDir; i++)
+ 	{
+ 		strcpy(dirname, dirs[i]);
+ 
+ 		if (subdir)
+ 		{
+ 			strcat(dirname, "/");
+ 			strcat(dirname, subdir);
+ 		}
+ 
+ 		if (mkdir(dirname, 0700) < 0)
+ 			die_horribly(NULL, modulename, "Could not create directory %s: %s\n",
+ 						 dirname, strerror(errno));
+ 	}
+ }
+ 
+ static int
+ assignDirectory(lclContext *ctx)
+ {
+ 	/*
+ 	 * With d directories and n parallel worker processes, every directory
+ 	 * receives n / d items simultaneously. As long as a directory has not yet
+ 	 * received n / d items, this is our next directory. To distribute stuff
+ 	 * even better we do a round-robin with respect to which directory we check
+ 	 * first. (Imagine we have 3 very large tables and the rest small, we want
+ 	 * to distribute the 3 tables to different processes).
+ 	 */
+ 
+ 	static int		startIdx;
+ 	int				i = startIdx;
+ 
+ 	Assert(ctx->directoryUsage != NULL);
+ 
+ 	do
+ 	{
+ 		if (ctx->directoryUsage[i] == 0 || (float) ctx->directoryUsage[i] < (float) ctx->numWorkers / (float) ctx->numDirectories)
+ 		{
+ 			ctx->directoryUsage[i]++;
+ 			startIdx = (i + 1) % ctx->numDirectories;
+ 			return i;
+ 		}
+ 		i = (i + 1) % ctx->numDirectories;
+ 	} while (true);
+ }
+ 
+ static void
+ unassignDirectory(lclContext *ctx, lclTocEntry *tctx)
+ {
+ 	Assert(ctx->directoryUsage != NULL);
+ 	Assert(ctx->directoryUsage[tctx->directoryIndex] > 0);
+ 	ctx->directoryUsage[tctx->directoryIndex]--;
+ 	tctx->directoryIndex = -1;
+ }
+ 
+ static char *
+ _StartMasterParallel(ArchiveHandle *AH, TocEntry *te, T_Action act)
+ {
+ 	lclContext		   *ctx = (lclContext *) AH->formatData;
+ 	lclTocEntry		   *tctx = (lclTocEntry *) te->formatData;
+ 	static char			buf[32];
+ 
+ 	if (act == ACT_DUMP)
+ 	{
+ 		tctx->directoryIndex = assignDirectory(ctx);
+ 		snprintf(buf, sizeof(buf), "DUMP %d %d",
+ 				 te->dumpId, tctx->directoryIndex);
+ 	}
+ 	else if (act == ACT_RESTORE)
+ 	{
+ 		snprintf(buf, sizeof(buf), "RESTORE %d", te->dumpId);
+ 	}
+ 
+ 	return buf;
+ }
+ 
+ static int
+ _EndMasterParallel(ArchiveHandle *AH, TocEntry *te, const char *str, T_Action act)
+ {
+ 	int					nTok, nBytes;
+ 	DumpId				dumpId;
+ 	lclTocEntry		   *tctx = (lclTocEntry *) te->formatData;
+ 	lclContext		   *ctx = (lclContext *) AH->formatData;
+ 
+ 	if (act == ACT_DUMP)
+ 	{
+ 		unsigned long int	size;
+ 		unassignDirectory(ctx, tctx);
+ 
+ 		nTok = sscanf(str, "%u %lu%n", &dumpId, &size, &nBytes);
+ 
+ 		Assert(nTok == 2); /* XXX remove, not safe acc. to manpage */
+ 		Assert(dumpId == te->dumpId);
+ 		Assert(nBytes == strlen(str));
+ 
+ 		tctx->fileSize = size;
+ 	}
+ 	else if (act == ACT_RESTORE)
+ 	{
+ 		nTok = sscanf(str, "%u%n", &dumpId, &nBytes);
+ 
+ 		Assert(nTok == 1); /* XXX remove, not safe acc. to manpage */
+ 		Assert(dumpId == te->dumpId);
+ 		Assert(nBytes == strlen(str));
+ 	}
+ 
+ 	return 0;
+ }
+ 
diff --git a/src/bin/pg_dump/pg_backup_files.c b/src/bin/pg_dump/pg_backup_files.c
index 825c473..87a584b 100644
*** a/src/bin/pg_dump/pg_backup_files.c
--- b/src/bin/pg_dump/pg_backup_files.c
*************** InitArchiveFmt_Files(ArchiveHandle *AH)
*** 101,106 ****
--- 101,113 ----
  	AH->ClonePtr = NULL;
  	AH->DeClonePtr = NULL;
  
+ 	AH->StartMasterParallelPtr = NULL;
+ 	AH->EndMasterParallelPtr = NULL;
+ 
+ 	AH->GetParallelStatePtr = NULL;
+ 	AH->WorkerJobDumpPtr = NULL;
+ 	AH->WorkerJobRestorePtr = NULL;
+ 
  	AH->StartCheckArchivePtr = NULL;
  	AH->CheckTocEntryPtr = NULL;
  	AH->EndCheckArchivePtr = NULL;
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index dcc13ee..229d9fb 100644
*** a/src/bin/pg_dump/pg_backup_tar.c
--- b/src/bin/pg_dump/pg_backup_tar.c
*************** InitArchiveFmt_Tar(ArchiveHandle *AH)
*** 153,158 ****
--- 153,165 ----
  	AH->ClonePtr = NULL;
  	AH->DeClonePtr = NULL;
  
+ 	AH->StartMasterParallelPtr = NULL;
+ 	AH->EndMasterParallelPtr = NULL;
+ 
+ 	AH->GetParallelStatePtr = NULL;
+ 	AH->WorkerJobDumpPtr = NULL;
+ 	AH->WorkerJobRestorePtr = NULL;
+ 
  	AH->StartCheckArchivePtr = NULL;
  	AH->CheckTocEntryPtr = NULL;
  	AH->EndCheckArchivePtr = NULL;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 39e68d9..abd90f5 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** bool		g_verbose;			/* User wants verbose
*** 85,90 ****
--- 85,91 ----
  								 * activities. */
  Archive    *g_fout;				/* the script file */
  PGconn	   *g_conn;				/* the database connection */
+ PGconn	  **g_conn_child;
  
  /* various user-settable parameters */
  bool		schemaOnly;
*************** static void do_sql_command(PGconn *conn,
*** 237,242 ****
--- 238,246 ----
  static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
  				 ExecStatusType expected);
  
+ static ArchiveFormat parseArchiveFormat(const char *format);
+ 
+ void SetupConnection(PGconn *conn, const char* syncId, const char *dumpencoding, const char *use_role);
  
  int
  main(int argc, char **argv)
*************** main(int argc, char **argv)
*** 249,261 ****
  	const char *pgport = NULL;
  	const char *username = NULL;
  	const char *dumpencoding = NULL;
- 	const char *std_strings;
  	bool		oids = false;
  	TableInfo  *tblinfo;
  	int			numTables;
  	DumpableObject **dobjs;
  	int			numObjs;
  	int			i;
  	enum trivalue prompt_password = TRI_DEFAULT;
  	int			compressLevel = COMPRESSION_UNKNOWN;
  	int			plainText = 0;
--- 253,265 ----
  	const char *pgport = NULL;
  	const char *username = NULL;
  	const char *dumpencoding = NULL;
  	bool		oids = false;
  	TableInfo  *tblinfo;
  	int			numTables;
  	DumpableObject **dobjs;
  	int			numObjs;
  	int			i;
+ 	int			numWorkers = 1;
  	enum trivalue prompt_password = TRI_DEFAULT;
  	int			compressLevel = COMPRESSION_UNKNOWN;
  	int			plainText = 0;
*************** main(int argc, char **argv)
*** 356,362 ****
  		}
  	}
  
! 	while ((c = getopt_long(argc, argv, "abcCE:f:F:h:in:N:oOp:RsS:t:T:U:vwWxX:Z:",
  							long_options, &optindex)) != -1)
  	{
  		switch (c)
--- 360,366 ----
  		}
  	}
  
! 	while ((c = getopt_long(argc, argv, "abcCE:f:F:h:ij:n:N:oOp:RsS:t:T:U:vwWxX:Z:",
  							long_options, &optindex)) != -1)
  	{
  		switch (c)
*************** main(int argc, char **argv)
*** 397,402 ****
--- 401,410 ----
  				/* ignored, deprecated option */
  				break;
  
+ 			case 'j':
+ 				numWorkers = atoi(optarg);
+ 				break;
+ 
  			case 'n':			/* include schema(s) */
  				simple_string_list_append(&schema_include_patterns, optarg);
  				include_everything = false;
*************** main(int argc, char **argv)
*** 542,547 ****
--- 550,561 ----
  
  	archiveFormat = parseArchiveFormat(format);
  
+ 	if (archiveFormat != archDirectory && numWorkers > 1)
+ 	{
+ 		write_msg(NULL, "parallel backup only supported by the directory format\n");
+ 		exit(1);
+ 	}
+ 
  	/* archiveFormat specific setup */
  	if (archiveFormat == archNull || archiveFormat == archNullAppend)
  		plainText = 1;
*************** main(int argc, char **argv)
*** 639,742 ****
  	 * Open the database using the Archiver, so it knows about it. Errors mean
  	 * death.
  	 */
- 	g_conn = ConnectDatabase(g_fout, dbname, pghost, pgport,
- 							 username, prompt_password);
  
- 	/* Set the client encoding if requested */
- 	if (dumpencoding)
  	{
! 		if (PQsetClientEncoding(g_conn, dumpencoding) < 0)
! 		{
! 			write_msg(NULL, "invalid client encoding \"%s\" specified\n",
! 					  dumpencoding);
! 			exit(1);
! 		}
! 	}
! 
! 	/*
! 	 * Get the active encoding and the standard_conforming_strings setting, so
! 	 * we know how to escape strings.
! 	 */
! 	g_fout->encoding = PQclientEncoding(g_conn);
! 
! 	std_strings = PQparameterStatus(g_conn, "standard_conforming_strings");
! 	g_fout->std_strings = (std_strings && strcmp(std_strings, "on") == 0);
! 
! 	/* Set the role if requested */
! 	if (use_role && g_fout->remoteVersion >= 80100)
! 	{
! 		PQExpBuffer query = createPQExpBuffer();
! 
! 		appendPQExpBuffer(query, "SET ROLE %s", fmtId(use_role));
! 		do_sql_command(g_conn, query->data);
! 		destroyPQExpBuffer(query);
! 	}
! 
! 	/* Set the datestyle to ISO to ensure the dump's portability */
! 	do_sql_command(g_conn, "SET DATESTYLE = ISO");
! 
! 	/* Likewise, avoid using sql_standard intervalstyle */
! 	if (g_fout->remoteVersion >= 80400)
! 		do_sql_command(g_conn, "SET INTERVALSTYLE = POSTGRES");
! 
! 	/*
! 	 * If supported, set extra_float_digits so that we can dump float data
! 	 * exactly (given correctly implemented float I/O code, anyway)
! 	 */
! 	if (g_fout->remoteVersion >= 90000)
! 		do_sql_command(g_conn, "SET extra_float_digits TO 3");
! 	else if (g_fout->remoteVersion >= 70400)
! 		do_sql_command(g_conn, "SET extra_float_digits TO 2");
! 
! 	/*
! 	 * If synchronized scanning is supported, disable it, to prevent
! 	 * unpredictable changes in row ordering across a dump and reload.
! 	 */
! 	if (g_fout->remoteVersion >= 80300)
! 		do_sql_command(g_conn, "SET synchronize_seqscans TO off");
! 
! 	/*
! 	 * Disable timeouts if supported.
! 	 */
! 	if (g_fout->remoteVersion >= 70300)
! 		do_sql_command(g_conn, "SET statement_timeout = 0");
! 
! 	/*
! 	 * Quote all identifiers, if requested.
! 	 */
! 	if (quote_all_identifiers && g_fout->remoteVersion >= 90100)
! 		do_sql_command(g_conn, "SET quote_all_identifiers = true");
  
! 	/*
! 	 * Disables security label support if server version < v9.1.x
! 	 */
! 	if (!no_security_label && g_fout->remoteVersion < 90100)
! 		no_security_label = 1;
  
! 	/*
! 	 * Start serializable transaction to dump consistent data.
! 	 */
! 	do_sql_command(g_conn, "BEGIN");
  
! 	do_sql_command(g_conn, "SET TRANSACTION READ ONLY ISOLATION LEVEL SERIALIZABLE");
  
! 	/* Select the appropriate subquery to convert user IDs to names */
! 	if (g_fout->remoteVersion >= 80100)
! 		username_subquery = "SELECT rolname FROM pg_catalog.pg_roles WHERE oid =";
! 	else if (g_fout->remoteVersion >= 70300)
! 		username_subquery = "SELECT usename FROM pg_catalog.pg_user WHERE usesysid =";
! 	else
! 		username_subquery = "SELECT usename FROM pg_user WHERE usesysid =";
  
! 	/* Find the last built-in OID, if needed */
! 	if (g_fout->remoteVersion < 70300)
! 	{
! 		if (g_fout->remoteVersion >= 70100)
! 			g_last_builtin_oid = findLastBuiltinOid_V71(PQdb(g_conn));
! 		else
! 			g_last_builtin_oid = findLastBuiltinOid_V70();
! 		if (g_verbose)
! 			write_msg(NULL, "last built-in OID is %u\n", g_last_builtin_oid);
  	}
  
  	/* Expand schema selection patterns into OID lists */
--- 653,694 ----
  	 * Open the database using the Archiver, so it knows about it. Errors mean
  	 * death.
  	 */
  
  	{
! 		ArchiveHandle *AH;
! 		PGconn *backup;
! 		PGconn *temp;
! 		char *idString = "id";
  
! 		AH = (ArchiveHandle *) g_fout;
  
! 		if (archiveFormat == archDirectory)
! 			setupArchDirectory(AH, numWorkers);
  
! 		temp = ConnectDatabase(g_fout, dbname, pghost, pgport,
! 							   username, prompt_password);
! 		PQsetnonblocking(temp, 1);
! 		AH->connection = NULL;
! 		g_conn = ConnectDatabase(g_fout, dbname, pghost, pgport,
! 								 username, prompt_password);
  
! 		AH = (ArchiveHandle *) g_fout;
! 		backup = AH->connection;
! 		g_conn_child = (PGconn**) malloc(numWorkers * sizeof(PGconn *));
! 		for (i = 0; i < numWorkers; i++)
! 		{
! 			AH->connection = NULL;
! 			g_conn_child[i] = ConnectDatabase(g_fout, dbname,
! 												   pghost, pgport,
! 												   username, prompt_password);
! 		}
  
! 		SetupConnection(g_conn, idString, dumpencoding, use_role);
! 		for (i = 0; i < numWorkers; i++)
! 		{
! 			SetupConnection(g_conn_child[i], idString, dumpencoding, use_role);
! 		}
! 		AH->connection = backup;
  	}
  
  	/* Expand schema selection patterns into OID lists */
*************** main(int argc, char **argv)
*** 816,821 ****
--- 768,776 ----
  	else
  		sortDumpableObjectsByTypeOid(dobjs, numObjs);
  
+ 	if (archiveFormat == archDirectory && numWorkers > 1)
+ 		sortDataAndIndexObjectsBySize(dobjs, numObjs);
+ 
  	sortDumpableObjects(dobjs, numObjs);
  
  	/*
*************** dumpTableData(Archive *fout, TableDataIn
*** 1531,1537 ****
  
  	ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
  				 tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
! 				 NULL, tbinfo->rolname,
  				 false, "TABLE DATA", SECTION_DATA,
  				 "", "", copyStmt,
  				 tdinfo->dobj.dependencies, tdinfo->dobj.nDeps,
--- 1486,1492 ----
  
  	ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
  				 tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
! 				 NULL, tbinfo->rolname, tbinfo->relpages,
  				 false, "TABLE DATA", SECTION_DATA,
  				 "", "", copyStmt,
  				 tdinfo->dobj.dependencies, tdinfo->dobj.nDeps,
*************** dumpDatabase(Archive *AH)
*** 1899,1904 ****
--- 1854,1860 ----
  				 NULL,			/* Namespace */
  				 NULL,			/* Tablespace */
  				 dba,			/* Owner */
+ 				 0,				/* relpages */
  				 false,			/* with oids */
  				 "DATABASE",	/* Desc */
  				 SECTION_PRE_DATA,		/* Section */
*************** dumpDatabase(Archive *AH)
*** 1944,1950 ****
  						  atoi(PQgetvalue(lo_res, 0, i_relfrozenxid)),
  						  LargeObjectRelationId);
  		ArchiveEntry(AH, nilCatalogId, createDumpId(),
! 					 "pg_largeobject", NULL, NULL, "",
  					 false, "pg_largeobject", SECTION_PRE_DATA,
  					 loOutQry->data, "", NULL,
  					 NULL, 0,
--- 1900,1906 ----
  						  atoi(PQgetvalue(lo_res, 0, i_relfrozenxid)),
  						  LargeObjectRelationId);
  		ArchiveEntry(AH, nilCatalogId, createDumpId(),
! 					 "pg_largeobject", NULL, NULL, "", 0,
  					 false, "pg_largeobject", SECTION_PRE_DATA,
  					 loOutQry->data, "", NULL,
  					 NULL, 0,
*************** dumpDatabase(Archive *AH)
*** 1977,1983 ****
  			appendPQExpBuffer(dbQry, ";\n");
  
  			ArchiveEntry(AH, dbCatId, createDumpId(), datname, NULL, NULL,
! 						 dba, false, "COMMENT", SECTION_NONE,
  						 dbQry->data, "", NULL,
  						 &dbDumpId, 1, NULL, NULL);
  		}
--- 1933,1939 ----
  			appendPQExpBuffer(dbQry, ";\n");
  
  			ArchiveEntry(AH, dbCatId, createDumpId(), datname, NULL, NULL,
! 						 dba, 0, false, "COMMENT", SECTION_NONE,
  						 dbQry->data, "", NULL,
  						 &dbDumpId, 1, NULL, NULL);
  		}
*************** dumpEncoding(Archive *AH)
*** 2015,2021 ****
  	appendPQExpBuffer(qry, ";\n");
  
  	ArchiveEntry(AH, nilCatalogId, createDumpId(),
! 				 "ENCODING", NULL, NULL, "",
  				 false, "ENCODING", SECTION_PRE_DATA,
  				 qry->data, "", NULL,
  				 NULL, 0,
--- 1971,1977 ----
  	appendPQExpBuffer(qry, ";\n");
  
  	ArchiveEntry(AH, nilCatalogId, createDumpId(),
! 				 "ENCODING", NULL, NULL, "", 0,
  				 false, "ENCODING", SECTION_PRE_DATA,
  				 qry->data, "", NULL,
  				 NULL, 0,
*************** dumpStdStrings(Archive *AH)
*** 2042,2048 ****
  					  stdstrings);
  
  	ArchiveEntry(AH, nilCatalogId, createDumpId(),
! 				 "STDSTRINGS", NULL, NULL, "",
  				 false, "STDSTRINGS", SECTION_PRE_DATA,
  				 qry->data, "", NULL,
  				 NULL, 0,
--- 1998,2004 ----
  					  stdstrings);
  
  	ArchiveEntry(AH, nilCatalogId, createDumpId(),
! 				 "STDSTRINGS", NULL, NULL, "", 0,
  				 false, "STDSTRINGS", SECTION_PRE_DATA,
  				 qry->data, "", NULL,
  				 NULL, 0,
*************** dumpBlob(Archive *AH, BlobInfo *binfo)
*** 2154,2160 ****
  	ArchiveEntry(AH, binfo->dobj.catId, binfo->dobj.dumpId,
  				 binfo->dobj.name,
  				 NULL, NULL,
! 				 binfo->rolname, false,
  				 "BLOB", SECTION_PRE_DATA,
  				 cquery->data, dquery->data, NULL,
  				 binfo->dobj.dependencies, binfo->dobj.nDeps,
--- 2110,2116 ----
  	ArchiveEntry(AH, binfo->dobj.catId, binfo->dobj.dumpId,
  				 binfo->dobj.name,
  				 NULL, NULL,
! 				 binfo->rolname, 0, false,
  				 "BLOB", SECTION_PRE_DATA,
  				 cquery->data, dquery->data, NULL,
  				 binfo->dobj.dependencies, binfo->dobj.nDeps,
*************** getTables(int *numTables)
*** 3540,3545 ****
--- 3496,3502 ----
  	int			i_reloptions;
  	int			i_toastreloptions;
  	int			i_reloftype;
+ 	int			i_relpages;
  
  	/* Make sure we are in proper schema */
  	selectSourceSchema("pg_catalog");
*************** getTables(int *numTables)
*** 3572,3582 ****
  		 */
  		appendPQExpBuffer(query,
  						  "SELECT c.tableoid, c.oid, c.relname, "
! 						  "c.relacl, c.relkind, c.relnamespace, "
  						  "(%s c.relowner) AS rolname, "
  						  "c.relchecks, c.relhastriggers, "
  						  "c.relhasindex, c.relhasrules, c.relhasoids, "
! 						  "c.relfrozenxid, "
  						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
--- 3529,3539 ----
  		 */
  		appendPQExpBuffer(query,
  						  "SELECT c.tableoid, c.oid, c.relname, "
! 						  "c.relacl, c.relkind, c.relnamespace, c.relpages, "
  						  "(%s c.relowner) AS rolname, "
  						  "c.relchecks, c.relhastriggers, "
  						  "c.relhasindex, c.relhasrules, c.relhasoids, "
! 						  "c.relfrozenxid, c.relpages, "
  						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3609,3615 ****
  						  "(%s c.relowner) AS rolname, "
  						  "c.relchecks, c.relhastriggers, "
  						  "c.relhasindex, c.relhasrules, c.relhasoids, "
! 						  "c.relfrozenxid, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
--- 3566,3572 ----
  						  "(%s c.relowner) AS rolname, "
  						  "c.relchecks, c.relhastriggers, "
  						  "c.relhasindex, c.relhasrules, c.relhasoids, "
! 						  "c.relfrozenxid, c.relpages, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3642,3648 ****
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "relfrozenxid, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
--- 3599,3605 ----
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "relfrozenxid, relpages, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3674,3680 ****
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "0 AS relfrozenxid, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
--- 3631,3637 ----
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "0 AS relfrozenxid, relpages, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3706,3712 ****
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "0 AS relfrozenxid, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
--- 3663,3669 ----
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "0 AS relfrozenxid, relpages, "
  						  "NULL AS reloftype, "
  						  "d.refobjid AS owning_tab, "
  						  "d.refobjsubid AS owning_col, "
*************** getTables(int *numTables)
*** 3734,3740 ****
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "0 AS relfrozenxid, "
  						  "NULL AS reloftype, "
  						  "NULL::oid AS owning_tab, "
  						  "NULL::int4 AS owning_col, "
--- 3691,3697 ----
  						  "(%s relowner) AS rolname, "
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, relhasoids, "
! 						  "0 AS relfrozenxid, relpages, "
  						  "NULL AS reloftype, "
  						  "NULL::oid AS owning_tab, "
  						  "NULL::int4 AS owning_col, "
*************** getTables(int *numTables)
*** 3757,3763 ****
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, "
  						  "'t'::bool AS relhasoids, "
! 						  "0 AS relfrozenxid, "
  						  "NULL AS reloftype, "
  						  "NULL::oid AS owning_tab, "
  						  "NULL::int4 AS owning_col, "
--- 3714,3720 ----
  						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
  						  "relhasindex, relhasrules, "
  						  "'t'::bool AS relhasoids, "
! 						  "0 AS relfrozenxid, relpages, "
  						  "NULL AS reloftype, "
  						  "NULL::oid AS owning_tab, "
  						  "NULL::int4 AS owning_col, "
*************** getTables(int *numTables)
*** 3842,3847 ****
--- 3799,3805 ----
  	i_reloptions = PQfnumber(res, "reloptions");
  	i_toastreloptions = PQfnumber(res, "toast_reloptions");
  	i_reloftype = PQfnumber(res, "reloftype");
+ 	i_relpages = PQfnumber(res, "relpages");
  
  	if (lockWaitTimeout && g_fout->remoteVersion >= 70300)
  	{
*************** getTables(int *numTables)
*** 3893,3898 ****
--- 3851,3857 ----
  		tblinfo[i].reltablespace = strdup(PQgetvalue(res, i, i_reltablespace));
  		tblinfo[i].reloptions = strdup(PQgetvalue(res, i, i_reloptions));
  		tblinfo[i].toast_reloptions = strdup(PQgetvalue(res, i, i_toastreloptions));
+ 		tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages));
  
  		/* other fields were zeroed above */
  
*************** dumpComment(Archive *fout, const char *t
*** 6277,6283 ****
  		 * post-data.
  		 */
  		ArchiveEntry(fout, nilCatalogId, createDumpId(),
! 					 target, namespace, NULL, owner,
  					 false, "COMMENT", SECTION_NONE,
  					 query->data, "", NULL,
  					 &(dumpId), 1,
--- 6236,6242 ----
  		 * post-data.
  		 */
  		ArchiveEntry(fout, nilCatalogId, createDumpId(),
! 					 target, namespace, NULL, owner, 0,
  					 false, "COMMENT", SECTION_NONE,
  					 query->data, "", NULL,
  					 &(dumpId), 1,
*************** dumpTableComment(Archive *fout, TableInf
*** 6338,6344 ****
  			ArchiveEntry(fout, nilCatalogId, createDumpId(),
  						 target->data,
  						 tbinfo->dobj.namespace->dobj.name,
! 						 NULL, tbinfo->rolname,
  						 false, "COMMENT", SECTION_NONE,
  						 query->data, "", NULL,
  						 &(tbinfo->dobj.dumpId), 1,
--- 6297,6303 ----
  			ArchiveEntry(fout, nilCatalogId, createDumpId(),
  						 target->data,
  						 tbinfo->dobj.namespace->dobj.name,
! 						 NULL, tbinfo->rolname, 0,
  						 false, "COMMENT", SECTION_NONE,
  						 query->data, "", NULL,
  						 &(tbinfo->dobj.dumpId), 1,
*************** dumpTableComment(Archive *fout, TableInf
*** 6360,6366 ****
  			ArchiveEntry(fout, nilCatalogId, createDumpId(),
  						 target->data,
  						 tbinfo->dobj.namespace->dobj.name,
! 						 NULL, tbinfo->rolname,
  						 false, "COMMENT", SECTION_NONE,
  						 query->data, "", NULL,
  						 &(tbinfo->dobj.dumpId), 1,
--- 6319,6325 ----
  			ArchiveEntry(fout, nilCatalogId, createDumpId(),
  						 target->data,
  						 tbinfo->dobj.namespace->dobj.name,
! 						 NULL, tbinfo->rolname, 0,
  						 false, "COMMENT", SECTION_NONE,
  						 query->data, "", NULL,
  						 &(tbinfo->dobj.dumpId), 1,
*************** dumpDumpableObject(Archive *fout, Dumpab
*** 6640,6646 ****
  			break;
  		case DO_BLOB_DATA:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "",
  						 false, "BLOBS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
--- 6599,6605 ----
  			break;
  		case DO_BLOB_DATA:
  			ArchiveEntry(fout, dobj->catId, dobj->dumpId,
! 						 dobj->name, NULL, NULL, "", 0,
  						 false, "BLOBS", SECTION_DATA,
  						 "", "", NULL,
  						 dobj->dependencies, dobj->nDeps,
*************** dumpNamespace(Archive *fout, NamespaceIn
*** 6680,6686 ****
  	ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
  				 nspinfo->dobj.name,
  				 NULL, NULL,
! 				 nspinfo->rolname,
  				 false, "SCHEMA", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 nspinfo->dobj.dependencies, nspinfo->dobj.nDeps,
--- 6639,6645 ----
  	ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
  				 nspinfo->dobj.name,
  				 NULL, NULL,
! 				 nspinfo->rolname, 0,
  				 false, "SCHEMA", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 nspinfo->dobj.dependencies, nspinfo->dobj.nDeps,
*************** dumpEnumType(Archive *fout, TypeInfo *ty
*** 6822,6828 ****
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, false,
  				 "TYPE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
--- 6781,6787 ----
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, 0, false,
  				 "TYPE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
*************** dumpBaseType(Archive *fout, TypeInfo *ty
*** 7201,7207 ****
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, false,
  				 "TYPE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
--- 7160,7166 ----
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, 0, false,
  				 "TYPE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
*************** dumpDomain(Archive *fout, TypeInfo *tyin
*** 7328,7334 ****
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, false,
  				 "DOMAIN", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
--- 7287,7293 ----
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, 0, false,
  				 "DOMAIN", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
*************** dumpCompositeType(Archive *fout, TypeInf
*** 7430,7436 ****
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, false,
  				 "TYPE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
--- 7389,7395 ----
  				 tyinfo->dobj.name,
  				 tyinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tyinfo->rolname, 0, false,
  				 "TYPE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tyinfo->dobj.dependencies, tyinfo->dobj.nDeps,
*************** dumpCompositeTypeColComments(Archive *fo
*** 7551,7557 ****
  			ArchiveEntry(fout, nilCatalogId, createDumpId(),
  						 target->data,
  						 tyinfo->dobj.namespace->dobj.name,
! 						 NULL, tyinfo->rolname,
  						 false, "COMMENT", SECTION_NONE,
  						 query->data, "", NULL,
  						 &(tyinfo->dobj.dumpId), 1,
--- 7510,7516 ----
  			ArchiveEntry(fout, nilCatalogId, createDumpId(),
  						 target->data,
  						 tyinfo->dobj.namespace->dobj.name,
! 						 NULL, tyinfo->rolname, 0,
  						 false, "COMMENT", SECTION_NONE,
  						 query->data, "", NULL,
  						 &(tyinfo->dobj.dumpId), 1,
*************** dumpShellType(Archive *fout, ShellTypeIn
*** 7604,7610 ****
  				 stinfo->dobj.name,
  				 stinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 stinfo->baseType->rolname, false,
  				 "SHELL TYPE", SECTION_PRE_DATA,
  				 q->data, "", NULL,
  				 stinfo->dobj.dependencies, stinfo->dobj.nDeps,
--- 7563,7569 ----
  				 stinfo->dobj.name,
  				 stinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 stinfo->baseType->rolname, 0, false,
  				 "SHELL TYPE", SECTION_PRE_DATA,
  				 q->data, "", NULL,
  				 stinfo->dobj.dependencies, stinfo->dobj.nDeps,
*************** dumpProcLang(Archive *fout, ProcLangInfo
*** 7758,7764 ****
  
  	ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
  				 plang->dobj.name,
! 				 lanschema, NULL, plang->lanowner,
  				 false, "PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
  				 defqry->data, delqry->data, NULL,
  				 plang->dobj.dependencies, plang->dobj.nDeps,
--- 7717,7723 ----
  
  	ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
  				 plang->dobj.name,
! 				 lanschema, NULL, plang->lanowner, 0,
  				 false, "PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
  				 defqry->data, delqry->data, NULL,
  				 plang->dobj.dependencies, plang->dobj.nDeps,
*************** dumpFunc(Archive *fout, FuncInfo *finfo)
*** 8322,8328 ****
  				 funcsig_tag,
  				 finfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 finfo->rolname, false,
  				 "FUNCTION", SECTION_PRE_DATA,
  				 q->data, delqry->data, NULL,
  				 finfo->dobj.dependencies, finfo->dobj.nDeps,
--- 8281,8287 ----
  				 funcsig_tag,
  				 finfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 finfo->rolname, 0, false,
  				 "FUNCTION", SECTION_PRE_DATA,
  				 q->data, delqry->data, NULL,
  				 finfo->dobj.dependencies, finfo->dobj.nDeps,
*************** dumpCast(Archive *fout, CastInfo *cast)
*** 8478,8484 ****
  
  	ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
  				 castsig->data,
! 				 "pg_catalog", NULL, "",
  				 false, "CAST", SECTION_PRE_DATA,
  				 defqry->data, delqry->data, NULL,
  				 cast->dobj.dependencies, cast->dobj.nDeps,
--- 8437,8443 ----
  
  	ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
  				 castsig->data,
! 				 "pg_catalog", NULL, "", 0,
  				 false, "CAST", SECTION_PRE_DATA,
  				 defqry->data, delqry->data, NULL,
  				 cast->dobj.dependencies, cast->dobj.nDeps,
*************** dumpOpr(Archive *fout, OprInfo *oprinfo)
*** 8722,8728 ****
  				 oprinfo->dobj.name,
  				 oprinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 oprinfo->rolname,
  				 false, "OPERATOR", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 oprinfo->dobj.dependencies, oprinfo->dobj.nDeps,
--- 8681,8687 ----
  				 oprinfo->dobj.name,
  				 oprinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 oprinfo->rolname, 0,
  				 false, "OPERATOR", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 oprinfo->dobj.dependencies, oprinfo->dobj.nDeps,
*************** dumpOpclass(Archive *fout, OpclassInfo *
*** 9181,9187 ****
  				 opcinfo->dobj.name,
  				 opcinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 opcinfo->rolname,
  				 false, "OPERATOR CLASS", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 opcinfo->dobj.dependencies, opcinfo->dobj.nDeps,
--- 9140,9146 ----
  				 opcinfo->dobj.name,
  				 opcinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 opcinfo->rolname, 0,
  				 false, "OPERATOR CLASS", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 opcinfo->dobj.dependencies, opcinfo->dobj.nDeps,
*************** dumpOpfamily(Archive *fout, OpfamilyInfo
*** 9462,9468 ****
  				 opfinfo->dobj.name,
  				 opfinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 opfinfo->rolname,
  				 false, "OPERATOR FAMILY", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 opfinfo->dobj.dependencies, opfinfo->dobj.nDeps,
--- 9421,9427 ----
  				 opfinfo->dobj.name,
  				 opfinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 opfinfo->rolname, 0,
  				 false, "OPERATOR FAMILY", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 opfinfo->dobj.dependencies, opfinfo->dobj.nDeps,
*************** dumpConversion(Archive *fout, ConvInfo *
*** 9578,9584 ****
  				 convinfo->dobj.name,
  				 convinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 convinfo->rolname,
  				 false, "CONVERSION", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 convinfo->dobj.dependencies, convinfo->dobj.nDeps,
--- 9537,9543 ----
  				 convinfo->dobj.name,
  				 convinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 convinfo->rolname, 0,
  				 false, "CONVERSION", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 convinfo->dobj.dependencies, convinfo->dobj.nDeps,
*************** dumpAgg(Archive *fout, AggInfo *agginfo)
*** 9822,9828 ****
  				 aggsig_tag,
  				 agginfo->aggfn.dobj.namespace->dobj.name,
  				 NULL,
! 				 agginfo->aggfn.rolname,
  				 false, "AGGREGATE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 agginfo->aggfn.dobj.dependencies, agginfo->aggfn.dobj.nDeps,
--- 9781,9787 ----
  				 aggsig_tag,
  				 agginfo->aggfn.dobj.namespace->dobj.name,
  				 NULL,
! 				 agginfo->aggfn.rolname, 0,
  				 false, "AGGREGATE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 agginfo->aggfn.dobj.dependencies, agginfo->aggfn.dobj.nDeps,
*************** dumpTSParser(Archive *fout, TSParserInfo
*** 9914,9919 ****
--- 9873,9879 ----
  				 prsinfo->dobj.namespace->dobj.name,
  				 NULL,
  				 "",
+ 				 0,
  				 false, "TEXT SEARCH PARSER", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 prsinfo->dobj.dependencies, prsinfo->dobj.nDeps,
*************** dumpTSDictionary(Archive *fout, TSDictIn
*** 10006,10011 ****
--- 9966,9972 ----
  				 dictinfo->dobj.namespace->dobj.name,
  				 NULL,
  				 dictinfo->rolname,
+ 				 0,
  				 false, "TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 dictinfo->dobj.dependencies, dictinfo->dobj.nDeps,
*************** dumpTSTemplate(Archive *fout, TSTemplate
*** 10066,10071 ****
--- 10027,10033 ----
  				 tmplinfo->dobj.namespace->dobj.name,
  				 NULL,
  				 "",
+ 				 0,
  				 false, "TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 tmplinfo->dobj.dependencies, tmplinfo->dobj.nDeps,
*************** dumpTSConfig(Archive *fout, TSConfigInfo
*** 10199,10204 ****
--- 10161,10167 ----
  				 cfginfo->dobj.namespace->dobj.name,
  				 NULL,
  				 cfginfo->rolname,
+ 				 0,
  				 false, "TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 cfginfo->dobj.dependencies, cfginfo->dobj.nDeps,
*************** dumpForeignDataWrapper(Archive *fout, Fd
*** 10255,10260 ****
--- 10218,10224 ----
  				 NULL,
  				 NULL,
  				 fdwinfo->rolname,
+ 				 0,
  				 false, "FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 fdwinfo->dobj.dependencies, fdwinfo->dobj.nDeps,
*************** dumpForeignServer(Archive *fout, Foreign
*** 10343,10348 ****
--- 10307,10313 ----
  				 NULL,
  				 NULL,
  				 srvinfo->rolname,
+ 				 0,
  				 false, "SERVER", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 srvinfo->dobj.dependencies, srvinfo->dobj.nDeps,
*************** dumpUserMappings(Archive *fout,
*** 10448,10454 ****
  					 tag->data,
  					 namespace,
  					 NULL,
! 					 owner, false,
  					 "USER MAPPING", SECTION_PRE_DATA,
  					 q->data, delq->data, NULL,
  					 &dumpId, 1,
--- 10413,10419 ----
  					 tag->data,
  					 namespace,
  					 NULL,
! 					 owner, 0, false,
  					 "USER MAPPING", SECTION_PRE_DATA,
  					 q->data, delq->data, NULL,
  					 &dumpId, 1,
*************** dumpDefaultACL(Archive *fout, DefaultACL
*** 10519,10524 ****
--- 10484,10490 ----
  	   daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
  				 NULL,
  				 daclinfo->defaclrole,
+ 				 0,
  				 false, "DEFAULT ACL", SECTION_NONE,
  				 q->data, "", NULL,
  				 daclinfo->dobj.dependencies, daclinfo->dobj.nDeps,
*************** dumpACL(Archive *fout, CatalogId objCatI
*** 10576,10581 ****
--- 10542,10548 ----
  					 tag, nspname,
  					 NULL,
  					 owner ? owner : "",
+ 					 0,
  					 false, "ACL", SECTION_NONE,
  					 sql->data, "", NULL,
  					 &(objDumpId), 1,
*************** dumpSecLabel(Archive *fout, const char *
*** 10652,10658 ****
  	{
  		ArchiveEntry(fout, nilCatalogId, createDumpId(),
  					 target, namespace, NULL, owner,
! 					 false, "SECURITY LABEL", SECTION_NONE,
  					 query->data, "", NULL,
  					 &(dumpId), 1,
  					 NULL, NULL);
--- 10619,10625 ----
  	{
  		ArchiveEntry(fout, nilCatalogId, createDumpId(),
  					 target, namespace, NULL, owner,
! 					 0, false, "SECURITY LABEL", SECTION_NONE,
  					 query->data, "", NULL,
  					 &(dumpId), 1,
  					 NULL, NULL);
*************** dumpTableSecLabel(Archive *fout, TableIn
*** 10730,10736 ****
  					 target->data,
  					 tbinfo->dobj.namespace->dobj.name,
  					 NULL, tbinfo->rolname,
! 					 false, "SECURITY LABEL", SECTION_NONE,
  					 query->data, "", NULL,
  					 &(tbinfo->dobj.dumpId), 1,
  					 NULL, NULL);
--- 10697,10703 ----
  					 target->data,
  					 tbinfo->dobj.namespace->dobj.name,
  					 NULL, tbinfo->rolname,
! 					 0, false, "SECURITY LABEL", SECTION_NONE,
  					 query->data, "", NULL,
  					 &(tbinfo->dobj.dumpId), 1,
  					 NULL, NULL);
*************** dumpTableSchema(Archive *fout, TableInfo
*** 11384,11389 ****
--- 11351,11357 ----
  				 tbinfo->dobj.namespace->dobj.name,
  			(tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
  				 tbinfo->rolname,
+ 				 0,
  			   (strcmp(reltypename, "TABLE") == 0) ? tbinfo->hasoids : false,
  				 reltypename, SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
*************** dumpAttrDef(Archive *fout, AttrDefInfo *
*** 11456,11461 ****
--- 11424,11430 ----
  				 tbinfo->dobj.namespace->dobj.name,
  				 NULL,
  				 tbinfo->rolname,
+ 				 0,
  				 false, "DEFAULT", SECTION_PRE_DATA,
  				 q->data, delq->data, NULL,
  				 adinfo->dobj.dependencies, adinfo->dobj.nDeps,
*************** dumpIndex(Archive *fout, IndxInfo *indxi
*** 11552,11558 ****
  					 indxinfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 indxinfo->tablespace,
! 					 tbinfo->rolname, false,
  					 "INDEX", SECTION_POST_DATA,
  					 q->data, delq->data, NULL,
  					 indxinfo->dobj.dependencies, indxinfo->dobj.nDeps,
--- 11521,11527 ----
  					 indxinfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 indxinfo->tablespace,
! 					 tbinfo->rolname, indxinfo->relpages, false,
  					 "INDEX", SECTION_POST_DATA,
  					 q->data, delq->data, NULL,
  					 indxinfo->dobj.dependencies, indxinfo->dobj.nDeps,
*************** dumpConstraint(Archive *fout, Constraint
*** 11677,11683 ****
  					 coninfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 indxinfo->tablespace,
! 					 tbinfo->rolname, false,
  					 "CONSTRAINT", SECTION_POST_DATA,
  					 q->data, delq->data, NULL,
  					 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
--- 11646,11652 ----
  					 coninfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 indxinfo->tablespace,
! 					 tbinfo->rolname, 0, false,
  					 "CONSTRAINT", SECTION_POST_DATA,
  					 q->data, delq->data, NULL,
  					 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
*************** dumpConstraint(Archive *fout, Constraint
*** 11710,11716 ****
  					 coninfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 NULL,
! 					 tbinfo->rolname, false,
  					 "FK CONSTRAINT", SECTION_POST_DATA,
  					 q->data, delq->data, NULL,
  					 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
--- 11679,11685 ----
  					 coninfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 NULL,
! 					 tbinfo->rolname, 0, false,
  					 "FK CONSTRAINT", SECTION_POST_DATA,
  					 q->data, delq->data, NULL,
  					 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
*************** dumpConstraint(Archive *fout, Constraint
*** 11745,11751 ****
  						 coninfo->dobj.name,
  						 tbinfo->dobj.namespace->dobj.name,
  						 NULL,
! 						 tbinfo->rolname, false,
  						 "CHECK CONSTRAINT", SECTION_POST_DATA,
  						 q->data, delq->data, NULL,
  						 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
--- 11714,11720 ----
  						 coninfo->dobj.name,
  						 tbinfo->dobj.namespace->dobj.name,
  						 NULL,
! 						 tbinfo->rolname, 0, false,
  						 "CHECK CONSTRAINT", SECTION_POST_DATA,
  						 q->data, delq->data, NULL,
  						 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
*************** dumpConstraint(Archive *fout, Constraint
*** 11781,11787 ****
  						 coninfo->dobj.name,
  						 tyinfo->dobj.namespace->dobj.name,
  						 NULL,
! 						 tyinfo->rolname, false,
  						 "CHECK CONSTRAINT", SECTION_POST_DATA,
  						 q->data, delq->data, NULL,
  						 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
--- 11750,11756 ----
  						 coninfo->dobj.name,
  						 tyinfo->dobj.namespace->dobj.name,
  						 NULL,
! 						 tyinfo->rolname, 0, false,
  						 "CHECK CONSTRAINT", SECTION_POST_DATA,
  						 q->data, delq->data, NULL,
  						 coninfo->dobj.dependencies, coninfo->dobj.nDeps,
*************** dumpSequence(Archive *fout, TableInfo *t
*** 12066,12072 ****
  					 tbinfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 NULL,
! 					 tbinfo->rolname,
  					 false, "SEQUENCE", SECTION_PRE_DATA,
  					 query->data, delqry->data, NULL,
  					 tbinfo->dobj.dependencies, tbinfo->dobj.nDeps,
--- 12035,12041 ----
  					 tbinfo->dobj.name,
  					 tbinfo->dobj.namespace->dobj.name,
  					 NULL,
! 					 tbinfo->rolname, 0,
  					 false, "SEQUENCE", SECTION_PRE_DATA,
  					 query->data, delqry->data, NULL,
  					 tbinfo->dobj.dependencies, tbinfo->dobj.nDeps,
*************** dumpSequence(Archive *fout, TableInfo *t
*** 12102,12108 ****
  							 tbinfo->dobj.name,
  							 tbinfo->dobj.namespace->dobj.name,
  							 NULL,
! 							 tbinfo->rolname,
  							 false, "SEQUENCE OWNED BY", SECTION_PRE_DATA,
  							 query->data, "", NULL,
  							 &(tbinfo->dobj.dumpId), 1,
--- 12071,12077 ----
  							 tbinfo->dobj.name,
  							 tbinfo->dobj.namespace->dobj.name,
  							 NULL,
! 							 tbinfo->rolname, 0,
  							 false, "SEQUENCE OWNED BY", SECTION_PRE_DATA,
  							 query->data, "", NULL,
  							 &(tbinfo->dobj.dumpId), 1,
*************** dumpSequence(Archive *fout, TableInfo *t
*** 12134,12139 ****
--- 12103,12109 ----
  					 tbinfo->dobj.namespace->dobj.name,
  					 NULL,
  					 tbinfo->rolname,
+ 					 0,
  					 false, "SEQUENCE SET", SECTION_PRE_DATA,
  					 query->data, "", NULL,
  					 &(tbinfo->dobj.dumpId), 1,
*************** dumpTrigger(Archive *fout, TriggerInfo *
*** 12326,12332 ****
  				 tginfo->dobj.name,
  				 tbinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tbinfo->rolname, false,
  				 "TRIGGER", SECTION_POST_DATA,
  				 query->data, delqry->data, NULL,
  				 tginfo->dobj.dependencies, tginfo->dobj.nDeps,
--- 12296,12302 ----
  				 tginfo->dobj.name,
  				 tbinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tbinfo->rolname, 0, false,
  				 "TRIGGER", SECTION_POST_DATA,
  				 query->data, delqry->data, NULL,
  				 tginfo->dobj.dependencies, tginfo->dobj.nDeps,
*************** dumpRule(Archive *fout, RuleInfo *rinfo)
*** 12446,12452 ****
  				 rinfo->dobj.name,
  				 tbinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tbinfo->rolname, false,
  				 "RULE", SECTION_POST_DATA,
  				 cmd->data, delcmd->data, NULL,
  				 rinfo->dobj.dependencies, rinfo->dobj.nDeps,
--- 12416,12422 ----
  				 rinfo->dobj.name,
  				 tbinfo->dobj.namespace->dobj.name,
  				 NULL,
! 				 tbinfo->rolname, 0, false,
  				 "RULE", SECTION_POST_DATA,
  				 cmd->data, delcmd->data, NULL,
  				 rinfo->dobj.dependencies, rinfo->dobj.nDeps,
*************** check_sql_result(PGresult *res, PGconn *
*** 12880,12882 ****
--- 12850,12963 ----
  	write_msg(NULL, "The command was: %s\n", query);
  	exit_nicely();
  }
+ 
+ 
+ void
+ SetupConnection(PGconn *conn, const char* syncId, const char *dumpencoding, const char *use_role)
+ {
+ 	const char *std_strings;
+ 
+ 	/* Set the client encoding if requested */
+ 	if (dumpencoding)
+ 	{
+ 		if (PQsetClientEncoding(conn, dumpencoding) < 0)
+ 		{
+ 			write_msg(NULL, "invalid client encoding \"%s\" specified\n",
+ 					  dumpencoding);
+ 			exit(1);
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Get the active encoding and the standard_conforming_strings setting, so
+ 	 * we know how to escape strings.
+ 	 */
+ 	g_fout->encoding = PQclientEncoding(conn);
+ 
+ 	std_strings = PQparameterStatus(conn, "standard_conforming_strings");
+ 	g_fout->std_strings = (std_strings && strcmp(std_strings, "on") == 0);
+ 
+ 	/* Set the role if requested */
+ 	if (use_role && g_fout->remoteVersion >= 80100)
+ 	{
+ 		PQExpBuffer query = createPQExpBuffer();
+ 
+ 		appendPQExpBuffer(query, "SET ROLE %s", fmtId(use_role));
+ 		do_sql_command(conn, query->data);
+ 		destroyPQExpBuffer(query);
+ 	}
+ 
+ 	/* Set the datestyle to ISO to ensure the dump's portability */
+ 	do_sql_command(conn, "SET DATESTYLE = ISO");
+ 
+ 	/* Likewise, avoid using sql_standard intervalstyle */
+ 	if (g_fout->remoteVersion >= 80400)
+ 		do_sql_command(conn, "SET INTERVALSTYLE = POSTGRES");
+ 
+ 	/*
+ 	 * If supported, set extra_float_digits so that we can dump float data
+ 	 * exactly (given correctly implemented float I/O code, anyway)
+ 	 */
+ 	if (g_fout->remoteVersion >= 80500)
+ 		do_sql_command(conn, "SET extra_float_digits TO 3");
+ 	else if (g_fout->remoteVersion >= 70400)
+ 		do_sql_command(conn, "SET extra_float_digits TO 2");
+ 
+ 	/*
+ 	 * If synchronized scanning is supported, disable it, to prevent
+ 	 * unpredictable changes in row ordering across a dump and reload.
+ 	 */
+ 	if (g_fout->remoteVersion >= 80300)
+ 		do_sql_command(conn, "SET synchronize_seqscans TO off");
+ 
+ 	/*
+ 	 * Quote all identifiers, if requested.
+ 	 */
+ 	if (quote_all_identifiers && g_fout->remoteVersion >= 90100)
+ 		do_sql_command(g_conn, "SET quote_all_identifiers = true");
+ 
+ 	/*
+ 	 * Disables security label support if server version < v9.1.x
+ 	 */
+ 	if (!no_security_label && g_fout->remoteVersion < 90100)
+ 		no_security_label = 1;
+ 
+ 	/*
+ 	 * Disable timeouts if supported.
+ 	 */
+ 	if (g_fout->remoteVersion >= 70300)
+ 		do_sql_command(conn, "SET statement_timeout = 0");
+ 
+ 	/*
+ 	 * Start serializable transaction to dump consistent data.
+ 	 */
+ 	do_sql_command(conn, "BEGIN");
+ 
+ 	do_sql_command(conn, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");
+ 
+ #ifdef HAVE_SNAPSHOT_HACK
+ 	appendPQExpBuffer(buf, "SELECT pg_synchronize_snapshot_taken('%s')", syncId);
+ 	res = PQexec(g_conn, buf->data);
+ 	check_sql_result(res, g_conn, buf->data, PGRES_TUPLES_OK);
+ #endif
+ 
+ 	/* Select the appropriate subquery to convert user IDs to names */
+ 	if (g_fout->remoteVersion >= 80100)
+ 		username_subquery = "SELECT rolname FROM pg_catalog.pg_roles WHERE oid =";
+ 	else if (g_fout->remoteVersion >= 70300)
+ 		username_subquery = "SELECT usename FROM pg_catalog.pg_user WHERE usesysid =";
+ 	else
+ 		username_subquery = "SELECT usename FROM pg_user WHERE usesysid =";
+ 
+ 	/* Find the last built-in OID, if needed */
+ 	if (g_fout->remoteVersion < 70300)
+ 	{
+ 		if (g_fout->remoteVersion >= 70100)
+ 			g_last_builtin_oid = findLastBuiltinOid_V71(PQdb(conn));
+ 		else
+ 			g_last_builtin_oid = findLastBuiltinOid_V70();
+ 		if (g_verbose)
+ 			write_msg(NULL, "last built-in OID is %u\n", g_last_builtin_oid);
+ 	}
+ }
+ 
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 0f643b9..42463e2 100644
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
*************** typedef struct _tableInfo
*** 234,239 ****
--- 234,240 ----
  	/* these two are set only if table is a sequence owned by a column: */
  	Oid			owning_tab;		/* OID of table owning sequence */
  	int			owning_col;		/* attr # of column owning sequence */
+ 	int			relpages;
  
  	bool		interesting;	/* true if need to collect more data */
  
*************** typedef struct _indxInfo
*** 302,307 ****
--- 303,309 ----
  	bool		indisclustered;
  	/* if there is an associated constraint object, its dumpId: */
  	DumpId		indexconstraint;
+ 	int			relpages;		/* relpages of the underlying table */
  } IndxInfo;
  
  typedef struct _ruleInfo
*************** extern void parseOidArray(const char *st
*** 508,513 ****
--- 510,516 ----
  extern void sortDumpableObjects(DumpableObject **objs, int numObjs);
  extern void sortDumpableObjectsByTypeName(DumpableObject **objs, int numObjs);
  extern void sortDumpableObjectsByTypeOid(DumpableObject **objs, int numObjs);
+ extern void	sortDataAndIndexObjectsBySize(DumpableObject **objs, int numObjs);
  
  /*
   * version specific routines
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index a52d03c..5853c13 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*************** static void repairDependencyLoop(Dumpabl
*** 116,121 ****
--- 116,198 ----
  static void describeDumpableObject(DumpableObject *obj,
  					   char *buf, int bufsize);
  
+ static int
+ DOSizeCompare(const void *p1, const void *p2);
+ static int
+ findFirstEqualType(DumpableObjectType type, DumpableObject **objs, int numObjs)
+ {
+ 	int i;
+ 	for (i = 0; i < numObjs; i++)
+ 		if (objs[i]->objType == type)
+ 			return i;
+ 	return -1;
+ }
+ 
+ static int
+ findFirstDifferentType(DumpableObjectType type, DumpableObject **objs, int numObjs, int start)
+ {
+ 	int i;
+ 	for (i = start; i < numObjs; i++)
+ 		if (objs[i]->objType != type)
+ 			return i;
+ 	return numObjs - 1;
+ }
+ 
+ 
+ void
+ sortDataAndIndexObjectsBySize(DumpableObject **objs, int numObjs)
+ {
+ 	int		startIdx, endIdx;
+ 	void   *startPtr;
+ 
+ 	if (numObjs <= 1)
+ 		return;
+ 
+ 	startIdx = findFirstEqualType(DO_TABLE_DATA, objs, numObjs);
+ 	if (startIdx >= 0)
+ 	{
+ 		endIdx = findFirstDifferentType(DO_TABLE_DATA, objs, numObjs, startIdx);
+ 		startPtr = objs + startIdx;
+ 		qsort(startPtr, endIdx - startIdx, sizeof(DumpableObject *),
+ 			  DOSizeCompare);
+ 	}
+ 
+ 	startIdx = findFirstEqualType(DO_INDEX, objs, numObjs);
+ 	if (startIdx >= 0)
+ 	{
+ 		endIdx = findFirstDifferentType(DO_INDEX, objs, numObjs, startIdx);
+ 		startPtr = objs + startIdx;
+ 		qsort(startPtr, endIdx - startIdx, sizeof(DumpableObject *),
+ 			  DOSizeCompare);
+ 	}
+ }
+ 
+ static int
+ DOSizeCompare(const void *p1, const void *p2)
+ {
+ 	DumpableObject *obj1 = *(DumpableObject **) p1;
+ 	DumpableObject *obj2 = *(DumpableObject **) p2;
+ 	int			obj1_size = 0;
+ 	int			obj2_size = 0;
+ 
+ 	if (obj1->objType == DO_TABLE_DATA)
+ 		obj1_size = ((TableDataInfo *) obj1)->tdtable->relpages;
+ 	if (obj1->objType == DO_INDEX)
+ 		obj1_size = ((IndxInfo *) obj1)->relpages;
+ 
+ 	if (obj2->objType == DO_TABLE_DATA)
+ 		obj2_size = ((TableDataInfo *) obj2)->tdtable->relpages;
+ 	if (obj2->objType == DO_INDEX)
+ 		obj2_size = ((IndxInfo *) obj2)->relpages;
+ 
+ 	/* we want to see the biggest item go first */
+ 	if (obj1_size > obj2_size)
+ 		return -1;
+ 	if (obj2_size > obj1_size)
+ 		return 1;
+ 
+ 	return 0;
+ }
  
  /*
   * Sort the given objects into a type/name-based ordering
diff --git a/src/bin/pg_dump/test.sh b/src/bin/pg_dump/test.sh
index 23547fa..cb984ca 100755
*** a/src/bin/pg_dump/test.sh
--- b/src/bin/pg_dump/test.sh
***************
*** 1,5 ****
--- 1,45 ----
  #!/bin/sh -x
  
+ # parallel lzf directory (multiple directories)
+ rm -rf dir1 dir2 dir3
+ dropdb foodb
+ createdb --template=template0 foodb --lc-ctype=C
+ psql foodb -c "alter database foodb set lc_monetary to 'C'"
+ ./pg_dump -j 4 --compress-lzf -Fd -f dir1:dir2:dir3 regression || exit 1
+ ./pg_restore -k -Fd dir3:dir1:dir2 -d foodb || exit 1
+ ./pg_restore -j 4 -Fd dir1:dir2:dir3 -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
+ 
+ # parallel lzf directory
+ rm -rf out.dir
+ dropdb foodb
+ createdb --template=template0 foodb --lc-ctype=C
+ psql foodb -c "alter database foodb set lc_monetary to 'C'"
+ ./pg_dump -j 4 --compress-lzf -Fd -f out.dir regression || exit 1
+ ./pg_restore -j 4 out.dir -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
+ 
+ # parallel gzip directory
+ rm -rf out.dir
+ dropdb foodb
+ createdb --template=template0 foodb --lc-ctype=C
+ psql foodb -c "alter database foodb set lc_monetary to 'C'"
+ ./pg_dump -j 7 --compress=4 -Fd -f out.dir regression || exit 1
+ ./pg_restore -j 5 out.dir -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
+ 
+ # parallel lzf custom
+ rm out.custom
+ dropdb foodb
+ createdb --template=template0 foodb --lc-ctype=C
+ psql foodb -c "alter database foodb set lc_monetary to 'C'"
+ ./pg_dump --compress-lzf -Fc -f out.custom regression || exit 1
+ ./pg_restore -j 4 out.custom -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  # lzf compression
  rm -rf out.dir
*************** psql foodb -c "alter database foodb set 
*** 9,14 ****
--- 49,56 ----
  #./pg_dump --column-inserts --compress-lzf -Fd -f out.dir regression || exit 1
  ./pg_dump --compress-lzf -Fd -f out.dir regression || exit 1
  ./pg_restore out.dir -d foodb && ./pg_restore -k out.dir || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  # zlib compression
  rm -rf out.dir
*************** psql foodb -c "alter database foodb set 
*** 18,24 ****
--- 60,69 ----
  ./pg_dump --compress=4 -Fd -f out.dir regression || exit 1
  ./pg_restore out.dir -d foodb || exit 1
  ./pg_restore -k out.dir || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
+ # zlib custom
  rm out.custom
  dropdb foodb
  createdb --template=template0 foodb --lc-ctype=C
*************** psql foodb -c "alter database foodb set 
*** 26,31 ****
--- 71,88 ----
  #./pg_dump --inserts --compress=8 -Fc -f out.custom regression || exit 1
  ./pg_dump --compress=8 -Fc -f out.custom regression || exit 1
  ./pg_restore out.custom -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
+ 
+ # lzf custom
+ rm out.custom
+ dropdb foodb
+ createdb --template=template0 foodb --lc-ctype=C
+ psql foodb -c "alter database foodb set lc_monetary to 'C'"
+ ./pg_dump --compress-lzf -Fc -f out.custom regression || exit 1
+ ./pg_restore out.custom -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  # no compression
  rm -rf out.dir
*************** psql foodb -c "alter database foodb set 
*** 35,40 ****
--- 92,99 ----
  ./pg_dump --disable-dollar-quoting --compress=0 -Fd -f out.dir regression || exit 1
  ./pg_restore out.dir -d foodb || exit 1
  ./pg_restore -k out.dir || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  rm out.custom
  dropdb foodb
*************** createdb --template=template0 foodb --lc
*** 42,68 ****
--- 101,137 ----
  psql foodb -c "alter database foodb set lc_monetary to 'C'"
  ./pg_dump --quote-all-identifiers --compress=0 -Fc -f out.custom regression || exit 1
  ./pg_restore out.custom -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  dropdb foodb
  createdb --template=template0 foodb --lc-ctype=C
  psql foodb -c "alter database foodb set lc_monetary to 'C'"
  pg_dump -Ft regression  | pg_restore -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  dropdb foodb
  createdb --template=template0 foodb --lc-ctype=C
  psql foodb -c "alter database foodb set lc_monetary to 'C'"
  pg_dump regression  | psql foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  # restore 9.0 archives
  dropdb foodb
  createdb --template=template0 foodb --lc-ctype=C
  psql foodb -c "alter database foodb set lc_monetary to 'C'"
  ./pg_restore out.cust.none.90 -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  dropdb foodb
  createdb --template=template0 foodb --lc-ctype=C
  psql foodb -c "alter database foodb set lc_monetary to 'C'"
  ./pg_restore out.cust.z.90 -d foodb || exit 1
+ psql foodb -c 'select distinct loid from pg_largeobject'
+ psql foodb -c '\lo_list'
  
  
  echo Success
#2Joachim Wieland
joe@mcknight.de
In reply to: Joachim Wieland (#1)
1 attachment(s)
Re: WIP patch for parallel pg_dump

On Sun, Nov 14, 2010 at 6:52 PM, Joachim Wieland <joe@mcknight.de> wrote:

You would add a regular parallel dump with

$ pg_dump -j 4 -Fd -f out.dir dbname

So this is an updated series of patches for my parallel pg_dump WIP
patch. Most importantly it now runs on Windows once you get it to
compile there (I have added the new files to the respective project of
Mkvcbuild.pm but I wondered why the other archive formats do not need
to be defined in that file...).

So far nobody has volunteered to review this patch. It would be great
if people could at least check it out, run it and let me know if it
works and if they have any comments.

I have put all four patches in a tar archive, the patches must be
applied sequentially:

1. pg_dump_compression-refactor.diff
2. pg_dump_directory.diff
3. pg_dump_directory_parallel.diff
4. pg_dump_directory_parallel_lzf.diff

The compression-refactor patch does not include Heikki's latest changes yet.

And the last of the four patches adds LZF compression for whoever
wants to try that out. You need to link against an already installed
liblzf and call it with --compress-lzf.

Joachim

Attachments:

pg_dump_parallel.tar.gzapplication/x-gzip; name=pg_dump_parallel.tar.gzDownload
�o-�L�<kw����*�
$mS��d�oGMz?b5������l��Ph��"uI*������ �%;��=�>�"��`f0��\������F�I�����F�=�M��9�=������o{s�m�l���[{��j�no�lm��ol�|�jo����a��0���$ub���W�������.1k4���9kI��
�pm"�b�W��{~��`��?�O������lro��h;�MK����:�J���{��c�G������o�N��/�����������g��5�M����o3���w�����c����I�������������w��E��g8
����=a$j�������w��~~xQh�9����D��`���&�����&�"���2J���R���G'o��3�q1�Kyl6�{�`ssc�5g�9�.�f��8���0��on������V_���U�`&>���yk��yQ��$,��M���9!��j$2���:�� %i��qS��m�`�Mnc�j���[c�����z���C�$�������A4pv�?� ��y���q4��j����tOX��t����h��[l�wh}'��"����A��!
Ln����D�U���9D��I�@/�1����?Ia���I
�.����#>vF��1!�dRW?��4����Vc�������e��������*^3��o��������'�������S]pFO�S0|���,�I��X��1P��s��W<M���F���CxZ�k��L�5N�a�����1���1�8q��nHUC�5$H(IGM@Da��!w�>d�,�@������]�!2���l�
���Y�4�"�?F��]�t_����*��4W������IpJ��@�������	w}�A������7�;�����{��6A2�L`��;0��M�zkge2�A�n�9]`��L�J����av��p��1�
Ev{Vl�{\g6��"�DQ��`����=��#����G�i�A��/G �r�N\AP4�!c�����T ��]F���HT]�^"hW�����2���7�f���=����-����/J�
���,�F!_H�-�@c�ax�T%_Z�e��+�b�`��
	�?#%��\�9+�NB>��x�>����N.��g��O��*��b���%xg������^��Z���,`�F�{��t*N[��
������g��v0�$C=�>��"\�m����I��v��������
<�@>g��f+n31`��U��Y���������AL� �)1��wj�5��#t!�G�#�����I�����'�C+�f�U��+���q�9��iM��B+F,����(���~I��C6T_��su����Y���P�\U��y�=?���~���S���N�R����6-�&O��bO���aGOB����u�
��M�
JYK����m�P��h������!ED!�k�l�'/r����L�	�
aDy���^q�:j������PcrU�� Wj����N��f��#t������w�oN����&��+0����5���NpZj�J%v��j�F%=.�&���}�J6FS~���z	���V��Y���W�h�k�=��#�c���?+�v���Qb�G��@�d�]r����u��=����dZ�i�9�1:���V��.��"@P�i�>/?���&Aj:%j��Lgj��$N���~�Y���N�����OJiZ9k��C[ye���%t�����PP��H(���Y�4jlL����y�Yi�h\���"&@m��8�o35
?\�%\�Y�
�
�R�T�I�,�5>��4� �X�"���zP[m����2�@l���Bt����z�d�d�=X�.�n�L|D��!��av	�	�}:Q9�#�=���h���,�'f.�%er:}@�q��z���x�n4�v�va3���`v!��0�;�[r����jM��)���J��d=�
I�J���Q|���
�M|�@
�D���1���lZ/��N��B3�id������|�����0�ZA�	�Q��}Er�p�A$Y����P���U�h��
a��������<�*v&�D��8&�t��6'L�S��l%���I�*n~��x�)��2�*2��0���K��1���
���G�D<Q����'�V���U	��i��V��$/�C����5c��U<s��^)<.�Q����pf��Y��K�O�y,^�'�����n�>R�D:%(����T3�_"M����	�����S��}V�,F=�8V�H=��h�.`n�4�f��0HN9�d�_�z�	$�����h��@(s�������%�s�]��e	������U(��G�4w<��x���2
�!��D��&N������(�qn)���)(��":L��yB�S�x
��tJ���n(?�a�2��a\oq�PC-L��?����G��u8�p.�^X/"_@�_�u����4��V6�N�%�\�������m�=4���`�����V�KQw7�_�[��D�����)7G"t�]�����9	��z71�1>����dP4��+*�\�`H��,O`5f�����T�0$k�\�*�JH��@�����.�����y��#��{
�B7Hg��R����E�'��	��A�x��M-����C&��=m{�*��8��m�y��Gc������m�>�hJ�i<��g�$H
8p�,�T $�L�"�QbJ� P	�!V>���{���/0�����x)%E1���LpK��fD'���JO0�j�0����\R/�e���N{��9j�������c�o�U��8����a��������m/�X5s�$/�������*�{��$c#�T�H���������>�|��������Y?$�E*@�����HH������u��H�b��|�,�B}�cuxvj'�FE��Ty�_�{HF�_�#�38#r�����l�V��%����#s���tx+kb��:�a`D!Mt2�X 3��5ze��������kp���PeQ���s'�� �����8xfQ���
�T���d.B�	 �F	D@H���t��)����"J���vF�T�3�j�xB&0g���E�C~za�����<'�e��{��������kR �E'D����f��"CT�C�$7W�Sh dK��*��k�g>x��ZMS����K�Y)A�=��=^*���k���)�%�+R�-%yJB�QE���,	��;C��y�l�-F�}�N� ���[?�����*���%*����P_��/��������h�Z�����u[j����/��k��%��d%K�$���q	��/�k~��IL��Y��}4�F��]s�	�7G�
&{�|��� ��.�C,e�+|���0	�
�"jZic�du�J�*���buQ�E2��A����IA�Y�����amU��r|+���f�p!9x_fpf!�������|�#��f9?�[���%|�r��������UpX�������x���x�xCY,��J���A;�A���\��,7�������d�NI�j2��>�(�j�����z�BWMF\5��do���i2Z�����7M��R�ib���.�|;��d�,�����v�Cn�+YR��MY���)Jo'�@y8+-Q��������f
�z��L�zO�l��B�<�5�Q����A��fv�f3]�%L����?;�G1�G�xX���1�<�2(2s��2��X������)Ab��B����_=�x����0�a�@���*�����(F�;��(�mO�m��b�1j��\s6��:a�����)CN�q�����<��a�2��x�;�J(��7
n�^�B;��������s����������H�*�o-e@�Q��:Tp���h��j�uK��Q��iUA.]��@�N���B��t�#�&K�����(qD�Q{�'r��t����\[�=��a�P�J+��+�F$��1'V��DI���it!���+y��u���C��i\��� �]���>��N�?����V.�t;�Nl�xb�%�K�������t����)�N�G�>�Xo+'5����D�.uUV�*uX�_l&_�$]��4���]b0�Vj���u���f�]�9`Q���|�~F[�`��\���kWf�	�+��mE���/����)C_�zf��������!�e,Od:�S����J^�[�]�)�|)��Q�%9�J%�[��������\��������(,9���r���kr:���J��Gy�K	;���'���[7�������l:��{[���/��jX�\kW[���^_��C^������qk���7UP2�B�	�'UUs��h��~��uY��R	�n��o����l����|si���1X��x��W���y�?��U��������a���Gg������wyX��������g��<�q�U��9���{`v�PKM��k�z�b�	�O�0��t��6=V_����d�MZm���K�6K���i�ud����v�����N�&5P
`��1� ���1yABy{(KPS��b@	P��X����C*:�jT�N� ���`zu�n-"��(�b�4T��W<e`!�T�:��N0��xo��N  �������I����-���o���y��zmF��Yk�G�lI&�#/�Q�!#)J��zc��'l ���.y��g�5B|�����z���3���=�?d�U������UB�e��t�&T��`=ZSo�;�)����1����_0AU���V�\3E�M����_�������A���+��n��.��e��i�h���q$��{�q�
������#�1�����6y�T���p�����v1�4�e�5��b�Z���%��#bh���E$���a�f?��e���n�U(�x]��9��`��.~����@����E���xR20�b�.S�F�kR�����&!�2E JB!��Mb����<�|l����������o���|�/��l���$)��4��uP%5��;i���x�_�
���:�����30�����~����V��p�M����l���^�Yf���s�'Z�l�,��n�����G�O���.���{Pt�L���CQ�����a�_�o'�S�V����tS���_��V�@`&�y��Y����.��$������w�f���6���E���}.I���n�7����	�`�Kh&����I
A��;����D������O��\�9C�7����H��������#��1%����������p6
@�+y�s����.��P�fj�����<m�g���'Mj,M��J�q.u�����.<�v��X�/h�������:����I��F�����/�����������x+^�y��$��yBe������NKI������c���Q�hr�C�U8��e��������e�|����^}oW�/%���0V#�W��r�}Z�Y�W47L0����`���AJy�a8\DF�d���~$Xh���������t$pM����Z`������R�����C�����{kw��\��zc&K�v4N�[vIB{s����S�0���Zo�.�q�e�]7b�Oe�9���.=6d�����A�An��`��Etr&/
��-}K $����OD5����W��.z;f�JID�W���(
�f�|m?����b�������������6�\�T��.l2�i�4�h��)��{��
�����R�S/����3ZT��c��:m��J���2��)		=�*]�%���e_�������}����S�)�$�_3��4i�fE��A[��$���w1�2��v���
0yZ�wo���'m��c`����������v���_�E{�����-�,����2f	��\t��hz���s���W���"�(��n���Z����]o�m��G�KW
`���W~�JT�W�d7����c��
���S�������pv��%��a]dJ�����DRc}����N�'�xu���k�:y�?�^vk��"���P)U����7��2��������K���D!�a�W�;�����BU�fi�!.����/A�v�(��re�evr��<#`�����V}c+{o{?;�/��T�b�dV�PzS��J�S1Q��������{OJ�pD��`��&���OP��2_&��
7���TTIQF�:/�*����5yn)b������ ��<�-����y[G�8<��O�0762B�.ab�����qn2y�6R:jE-���l�����{�3�@wUu��S��~V��)��������|�
H ���R[����,K���$��D.�Y����s��bM���(b�\aNd�G���>w��4]����D_�?�g.��U��z5C�q�/�%<�l�<�=��$�Ix$������he>Be,���"-S���%,��F�x6����A�����^��!����f f�R�d�=:�#�a��15���Awb�V("R{,M��<P�A��.z�8�L���������Q��,��3�
`�,�h����v>���np�|����������Lu����M5o~�n��3�F%~6xO����9�8p#�;(l���C�����(�������z�7�`�oj�A��'�����-���G�y*�nv���ya��g���x7�qs�$��D���[
�)��E��
��=�UM�
n'1Tf�y���V�@n���|S
D�V���F�3��2T���e+��Q��c$a�<���\VYA�U�	j���J^�����I"�������u�g������C?2/���_�KG�1���ru.
���J6�^["��9EV-�I���������n5;��ib�;�T��Iv��U��e�t���,��@���l�nn�n�Ls0�����r�UO�Kh�����+��W{G�N��-�/�s[�/g�Bwsj&>�;���e��LeV�
�qo>��������n9���}�!/���2��2�4����'��I|y�������}��F����X�L.�����;��m�k�VY����}8�{8�����L��`bYx�&"^6C�)a1T	f�3g>��Z]�gr�&�u��VH�b|�2O$�D�f��hk����"��o�6���s���!������NS�����$;�X��w:�fX�!���V�\�T�
m'y���$��m�xU���>6�s�P�z7
�k�M�Vo��s��lCC�������0���b�$D�����I�3>����k�"!\��)�Z�������������������K6
=b�Ll�����]��{�����s�:�b��-Y�@���o��^p���R���k/�<�%%�t8������_�EtY���,���"�d0j���w��<�@m�t��1�^���}��<)a��j5+��7���{e���Vn�	i�cn����9*-}�S�2�ZXV������x����lD���TN����j�����gYge�GF��W���Pv�w��\�p���x�Y]3������
z��>zV_[U�_J#f7q���Un�J-������Z�3rTw!�b'�U/��pX�t�T����Z���l����T��]q�YR����������g�F�L�C!E���j8[�a]�Z]0C$.d�P��y\qF�a��j��HE�P��d�j���qr����������|���m�Y�}\T�F/z�{Uh�
��������$�$�)F����7C�������)7�����V���UrS��Da'�����a0��0%P���iHI4��Y��Q��_�*�U2�9�����z�H��Id�@����#����Zy���9��:���@A��L��5�"���Fw@�kT+q����-M|`���6A8��C�h`V350I�*e�c����(��Y�@J8F�@��J$#S~�W��2�p(������81�%�p���'��(����)�N�6\����T#1�f���B�h�\e�!{�(��|����=�"�'�}:�����rX��A�)�vr����-���V���W��Ml����Wnn U�z������9.�M�N��<U|�Rn+����U����pn����v��D���bKJK�Yp��FS�w7��BE����f�U����l������/�7�%L9�F��� �gu^`�m���^"F�	�Y$t�X��EwXQ8}�O'c�=*xNb�!�G8E��C�(�P���WG�N���[���E�vf�om�e���C��^7�������,7��=��6�+�#������;E�Y��j��sL=m��pJ�|�Y��(oomb�$�����k�w�������g�R���*g����� �s�3�����b��������I�egc�q�.I��t+M����y��(<pS(�����2��w�2�C����A5�jS�k����������~��G�qH	g@��N:#?���1
A����bI�jE�(��o�b
<�O%�M��X�;�a��o�=���!�	K���E���O������6j�b�n�i�����L���zE1o�*h����8I�4�r�Z����(?�����b��]3��N9r����8,��G���r��e��E���!�t�g�1v��5�y<�����U �a�;&5�[�c,�P��:���Z��@0���ydX����H�B�!�	_K;�!�8���	��Ee@������	�����0�X�e[�j��+H	�]�w�bi�P�y�T�
nfN�w�l�h=�Ih/��-�Fm��T��
*�0(�V���p^��ED������c=���W��+��������o�����+�F��o�\�T�����l5��j���[�R�_����v�?S�7����3��{��q8�^�%0>�<}���g��]�W��R�Zm�aS]t����n�$�VU�8�Z �a�fH�q?�������I��O77��z��8�PF��X�'kZ1�F��Ym���H���g�`��I`=�~���*}$���^'+a}(�|��	>�j ��St�ez�!e��<HSt�H!g�*Y~�_6�f�����b����Op�{_n++7���[�9��H�N������Fa}	��f|�90C�=�v����>Kb���9��`bM����D-$e����j{�:?�*H��{~���yG�P_�bl�B����pBXB�iH�
|{���#�{��K3x
�����B���'��N�G�y��J��\�� ��@U�U�H��wG��A �J@�q�BA��R�ck���1kV����'�[����Sr�(��M��W���.V��~s��c�\�of���>���uG\l�g��;��m����hz���j�W�Gf�W���6�z���;��i������iz;������Y����l��C.�^�0��o��]���uL�R�{�
�H�A@%��H�\/���\C���;�W�IRS����M�E_u��}N���O%{n���r�!+�/�����.�)����c��Y��^	�u����O�Q�1��`u�q4=���O�G���{\�0!�$}����Zo���A��=��-?�6���7:	z���g�V�H-%�s�@%mrP��.P��v���-��%�=�<��[�I�=��9��0}����}85������w���
����|�Pv��*n�m�f�Q�j�
���s�A+/$.�g��CR4�z=���1zq�Yg����;���#p�c�lOUB���=���w�m���0���L3���F��:�0�q0����&1/*��z����Zhb�`V����=�L���U���c��Ls�k�}@J��+��Y�V�K�-��-W������m�uQ��x�o�����v^�x��x���>�Ft`s��/�m�bU'���=�'OV2������D���/����D%����`�<6���bL�{@+G�������Z�x�DjP���?���p���}p�Jy���jE0l)�'Syy��m,����2���Kx>�������[~�U*�����Jy��������8�$u���De�{�v�dA#�%�y;D�(���3��G�_�G���z�U��S�����u���Ho��z"��3?���a`v��.;�=}�3Ts�%�\O���<����O��
���)�����Gty��Q�T���U���D�����o������g��;�b;����'�C*X��j����l��$��L//�1��R�9B�j���''�Qm"a��
6��y�[�b�lW����i�#.�������;�{��V�|z�+3w�3�p��cEUp����
M	u��=K>sEn�(6��9��0�rQd�2����<��3��Pd����}��\R��Tj5��Y���s5�����L���JZ�����:~�N)���G��^\��tH!�����]d� ��_�B��\cXk3#u�n��Y$�����=]����w�`���p�U����q*��FE�-����>\?'��k����p~|Lm;gr�y��@���c�����$����yA�Zf�ta/fP��m�Ax�jm�]G`%5tJ<D����"����L�����7A��!�C�])���
9����-�$�vE���*������~Mm��������Df@/��5���mZ�O!�����0$V��p
�(�O�@�3�|Luq��U�����aK5�_!��8�	.�]�5����<���I����BN(U���Cp����9Y���3X��m�33���oON�^��	�L����qF����������&�{(k����!TK?�6��������M��}������{�9��?�>��m;i-�(����_�9�Q����Q�o�+�_�����h����!F�Q��4�b��S�g}^��g��V��O�+��>�x�NH�
�����Y�5Z�J���m���}~�{Q����@�����W����P�
bB��4?��Y9@��mc��cm��mujg�fo�������l��GL�s�g�
�II(��������H��CDi�}`���|��T]��`�%�Q�eR�pZ�>���QqDq*vxK�%�cq���h�a784H8�&�%��M�Vj����}��W. �Z�����~H>�����|��h��Cb����x,��u�n���x
7*{�a�ty����`G3U&p���E�W�z
��?��@X�#'����^9���TU4�(�%�}���4����
����k��t'o:u��^������=�y��=�*��`���E�G���Q�3����=2?����n��4����h*M��2C�t��T3���� �boQS���=n'�����.]�&1���GL/lo������
�a���2=�����1���,�R�s���nX���S+������wO���gVag�};�����d/fJ�}4S�ln�R�L�-S�3`.S��~q�4g�����yL��m����Li��S:��3����]��\��&��r���pM���!a8_&9M��	>1��R��n�  
v�&��0JV�����"�c��i���\���D0W�����oR��5�BS�n�W��|"�Z��X�l�2P���l������o^��9}���5~�V*��W���:����'hO%E+��n0&m[$0ktKh�Z�z�������*[��+,�/������$w�\�~���`}����i��T����$���[�b�]1��X�$��<�'�������
!+�g$������h8��b%&O�������������d��~<���������	�R��O:XC������Bm(�@c+�����v,��Z�{��4\,������D�Rz,�h���ZIe��aX�n�Nd�*�k^��?�����B����p����R9	���	�3��(�����/���\�cXrj��������|�'1�~mJ���NrBY��W��$�B�1\�,�T�V�Q�n�MT8_a������Vi������@d��w;���/��RS21�?��N��.�@-�W<,i�z0 �/�AMSTT�����T��=+�����f��pY}�X+7���9���IT*W�����o�y�&)l8'Q�T���l����U(w��������@�H�
v�������%+�v`q�(�z��
t�Z
p�R79><f����$����(|�=��T���!���A�����I����e�E������M���6�5O���F�K,k� �c0��]8�U�:�%������~��(E)��������e�cv�����1��9Lb}��a4��1]�P�zo�_��j����`�y:����]a���!6�B���*0���9�X9������Fg����+-'�-E��~���.A��~x�����8�~����Q7����y��A��D�evyTW���!&���r�k��I|��Z�k��9��XH��U�=k�:�`��J��o��L��D��(���x�b����!����4��|�Lm��S��j��F�����F��-i5o�V�qV
���Vk��������O�M�W���S����:w�6�.i�=|`�cg?���m��2����3�U���
q-d�Y����b����Lm�E6It<P�[�g`db�����Tk+u����,VL(YjR~/k���S�D]D��VG
x�����������%��G�9g�3�fx#���G�}����)�f[gg�#m�\��o&Bn�06tf|4�&9g���F�����h:��+���&�T
9���S��ui�Xie�����}�[����`��Yn��j�<A�IX���tNv_���a�ly�A6����������h���^-��<@dB�51 �Um�U�7��F�RH)�a�n����hO�/��
�s&zT���~Dq��Qq���*���\y���������XR@��Z���3�Y����LJ eXeEP<��1f��dj��_��7�}��0��"�L?\)��r~�����U�Jx�GJ�h����pc����$bD+1�*Y,��N����@�����!JTr�C��Ls�+(]4���v�Xk�����zY����e��V���������1�����������]��,����H���hI(G�s$-�&m�`O�H�z�\��Rq�?��7���
/
0-W�/�T*�N���q6����d�Z���Z��
����(n*��I�4�oO�;;''������?�}�wx��A2u��{!e�� 3v+2O���B�D���P�D`�����Y�T
�e��ZX`���Wt�$�,z�Dpi+��-����q�|��4��������j�d����L�L��O�~�i������T^k������Xu���T�d���[���e�T��Q����D��iY���WT���|5�M�bJ����LG�t`e4/�.�"�t�	LQ������Z!���2Q�����NN�^��h�}�4�8����(�m�;v��M�Jb$��]�`�C�'3��k��p����F~���(����"u
\2�i�T4R`��$���aL�%��'�/��/F^"v������\�/�`�c�����ht=�/@��j�\.zo.�A8����a0�O�c�S������P��+D���)�w�_��.�4e?���hv	X�t�2��i�l����L��#��B�Cd�1���� �H"�K������dr��,0Ik|^��������������C~����?�&w�K��a����'������������v
��3�w����������������uL�Z����K����}�~|��H���3��._����s��d������N�5����;@��=e��"�o�V(��k��s+�$c�L����J=�e:
����1����Q�s����NP�6����j��H$2����YV��<�Kj(]�"=��1���M�c$�����Y����:����c3���v��[vO��$@���7����������41���P���kr���3���s*���n�,����Y�w������6w3�
��*���.p`���J�%
,Y���T�������\Od���8}��3B�Q��@@:)c6�s'G���sI�zz,��]��g��s5
>���zap4�G;�����bv6��aSF���|P}��:�{yh-�l�+���1�"�'�������&?��Os�6��|��0�sF�%�6��K ��j�[w>��S3�K��������>����io1�?uNN�:��>�����R)W��[t\���}�������pA�'u��5�'�T�PH@!�h:�.YT��4�<�g~v���d�Cidl;x������g1��K�t��8<�J��/q�G&D����{b)8������\S0E-�Z�n��!M�2�(y:��%�3I�dz2TL���}�0wKH�L��$r)�P�e�p��Qs�2w��8T�4��rt������}g@���g�c%��L������N���{��p��9m�����8"���Qb��D���#
�RM���)�"Slw�&1>iV�g����I�S:'g�&)&��5*�8Q1m���#�A�rcjS�#o�]���oKR&z�j4�&%n�R4j��Pg�$���zX%l�(1Gl��i��1m+�
�lu���A/`�K4�K�> *\bp����a
��@�B�cil�Mz���I*rH���*�9Z�,�#v�&qaw\U�;	/C5q>�NS<�y0��{�����]����M,y/����?.f�F��C��:��9K��������f�bm�gdbZ��%��y�T-�z��l���no���To��a���O�3}�dT+��j0k�|�W���<���C�,TU�d� r���G��n�������(���h��ra�(���Aj!��Mgf�"]�3�T�N�k���74�VJ�9��uokp�����\(��^q�~�*9�s�+�WV�V���dU�����J�0���q�m�Xv.��R���V���\6h[/�������Fk���I�D�}�Z�4N��X�`�?5-�".�Z��	v��x��'v��-��C���������N'g|���\U��ff����i��F�h}-��c��H��h��&p�F����y�J��7���V�<^&V�t���}#�/���P�;����:<�I�������P�M����[��P4��ABXQH��������"K��mE:LE-���#{M�
�"�H��]1�kV����Gr�Xs	�A��'b��D)��Y��	��)I,vm�}�Rx��:�1`	���D=+uP�)e��R0�%�;�Tg�6�HfmAU�
��V���~��gZ�j�u�>m��Z��m��^�r�K`����J�iYa��J��0����k��X�T�Ziu?��m1����UK:��iI��1���e��%9S_�Pm�i9�&mZ6�o����oc�ulO�2;�w����;�t����2$����&���!%����lIXY�4�\Y��SHuC�5�b7nR-v#�x0��
�E������+��*}�[��=�Zxu��1}.)aH����\Vn�go���a�m`OU*���nN����r��0�25�T���W��Ub�%�0#!��m]�o��0�LsZ�a��������L5A�|ov��K��N��<�'� (�9{����
�gAs���&��Q��}<��1Q.T,i�b%VC%,�]���`yV�/��,W-*����>e����/*C�����1�*{{���FoTQ�
!�����G0!m��!�$t���h��2�r���<:�`L�
`z�����Eql"��ID��$cr^Ltz����(}���)�#�78���G��9��y���Z�Z��V���y�9�
d�r�c��x�,�;���	sV��f@��
�e��0�s�9 �N�x��8�\z`*E�"v���7�F
��{��B�^<@mL�1�D��
)E�i,�7t�_�N�x�NyE��P/�eqU�F��=Y�t&N�#*O~A����j�\]}�b(Ub:��a�T2�	�U�K�2z�����A�)�K����#�z��^�����n`���#�E�D�u\����#���H`?�����r�'s�$�T|��������%��I��Q:��0�5�M�f����{�������#5�n�X<��C���[�L,E���#M���	��V�<�-�MG�;A#=�OF�F	bdAH�e�!��9��V��e��t7�4/O����L8�l&T��b���,���"s�x�D�0��D��h�0|��D*6MU�a4���Z�O�q�m.�E���KT��b����4$�z9'��f���W�)^w�3����~�8}���;�]d���4ND���tlQ���3����x��8w��j����s�Fd��~�=~��yst���f!������R�Gz������]H���\����)#^.*��:L�f<�4���G(�����|�n�dj�V�����N���WFC�%�U���?�d��e���
�_w���BJ�4�U�$���t)��k��HlKfoWr��O�����9X�"�@�������_M�a�$5e�?6���3<���,���n�.��CU��{���-�72�-w&���^�:;��ruK���yF�LBc#�[������~�����W\�[��=TY� ��b���3hC22�2��}f�}���������Q%7Y��X�+0�� ��iN���q1L�������S4<�WB���:c�����/t-4a�����lwJb�)�g�{)��dZ*�Qp��5�W���ni�<>���Z�[��R��0We��N6)������Y���T�����$����3a�u��w��H�{��P
�������>�ELr[�� ��_�eD4����
Exrn���Q���L]��&�#���[{�b�����[�_��d�����i1it!���f�p�f*�4����(*wo7�^�\�;A���4�;�/3��<�������2��m�fyp�m��� �v��K)	�;��F4�����u2��q0pH�;Nq�Kt%�`���'���C#lYyH�J��m���	�q�_�0�.I3
s�eFIc�������Z��"tz:>���	�DDo�P%����V������^A<���dv�l%�|�!�HRB����y�#|���0�S,4H�2���]a�K�4������ T-��������0xrm{�#���g�	��y
���=a��r2�*<��vwp:LE
�~�Nu&*�!��V!
E}b���]���N�D�[�
��K�������k$�nliy��;l$I�p���mV��������'+�,��=�����yS��$W��{N���E�l6��mq���� �pN�I�0��L���.���C>y�Cc�"�l�H��	�Z�vq&u*� ���_�N�5����%�&"�b�9l���������2����z���uP7%�g�-���4t��L����r�6��a_d�"�%�E0����=���E������9��M���*f�[���_Q��qt�|)z����u�1��!�)���f=)���H i�F�9yQ�����Z���A!o�i(�C�IJ��PD�n�"�`?g���q[q�A��uH�������~3B�`mfU������r�9��z��N�W�3e��I�(�K�����y����H2����xn�������&)�?���'�%���]`�;s�����YE)%������d��wf�e�~'����j�rO��b�3��
���Y�X�|-h�\��_�:"����f�7ej���/���d�mY�[�|������ZY�qn�ds	�K��4p�������5��&a9�8��:j�����CIrRg���
���6h�2/L.��[VU7��Y��2�C����i��U�{�b���3���U���,2��D���fL��x�{6��%���$�L�
Wt�W�V��s��+��;�1CH�u��}�n6�1�����=�]���j%�UI-iS�6N5I��v���Q{�����z]S��<B���C���#��L>h�g��9A;�k�1��3����|�^|������}=������@�zn����Y�oFZIr_>����/����W���5������=��e�$�\�7.��%��.N���S�X���O�4q�(�
�I��'��KQR�3���;X����LF��nFI����������;���Vw�CFb��x����nD[����A�����!4��3����q�/����TF�Q0>���hN���%��B�����u
�c9�<���������%����?t`�z�]����@z������u4����/����k�J�?���qx>*VRTZr�{��B�����L��N�	��
�o}�c�*e�_<}�b.R)z��7���%;��b�w�\�s�,�0���=N��q�}��f��4^�sN�������pgsO�M��h�E�w���r����ZF�$�@�������6��I[��e���,��/�s%��ah���`��X2N�������:��������;(������>�7��5��]�W���,���&%�`�4���B%B��a�m��/b�!�����}l����_��$M�<�Jg5'��:��E�v{��ZK�>��e;�K�����:2*�RQ�P����#vL�ZZwl�]2�H~�������(%�5.��i�{"�X�sL}h���������AK��M\1g�����s�\,�c�&�������ya�kt�,���E�{���;���������-i�e��v�yih��{��F�����'Y��`O���}��w�<71Ty>�.����`�,u�M�}3��o#���� "�������W��;�d��|Q�3�
<<���4q��!�Wfxk8$t��h�$��%96���#�Ox���/5�7���g_}"��y���y�G�:8o	�c�$��������nrE/ex���|�ms6���������z���2d�l+��dyV���w0�j�m3������!�fQ9�HrN4?
m6I9'��O��=)+��Jj9�15*�?@�>�a7T^����`_J+��I^Rh����G)	�4������^KVV���+�
X����@�pWV"g�r>��r�?����B��J���W>�C��.������c;��3���!^S�v��K��|]J�x,b��$e��m�������*~�=��D|t}
�C�
��dKR�r����%D�q@�S��!�Q�$u#?��4Q�T����K�;�<_j��X�?D�L�9�]z��{�:&�D�#�fj�I�`r.��=���z@6���U����9,q�pAx�)���8r_q�H�*R�65���1io4=�pt��}A��I���c�����u6��b�%Jg��	��|@���w��xI�*�}d"Wj<�1��\�(�/���%����u
7���|�8s���� �7T�S��6��Z�LeQf
���q@)��J����.����XZzE�8Ld�)�O�3�
����c��:�h���v�;��rN��M�2Mr����::��O�����e%��{�e���`����f�m�g��/����#��0���2��F��������SO�r�b����B�I���]��[��=�)D}��4�|��=�HuG�;�%������	� ����q2�_y���qxBz�!�L2V��ll82C�$��.�~;�
��O��X-�����i��� �����.�� F�I��1����<Lu
G�K�!�������0��r}{��
���J����n����{����b�5+e/FAV�\e�Re����r�G(T�U��?�im$
d�3�a3Pt��@j{~&�^+����Z/������`�}8R:klQm��������2�h�@:w����tp���6|����w����XrJ�
"V�q�&�?�>��]<��9���(yf�(���`�����
��n�|F��$�\K����sJ��	��4L���'�"py�$x:���5����+U�����4� r����A�E7��������l�����m�!��G_}?4�������qr���������d"O�������������-�}14��5��qN��++�����#`��p���������x�8��*y��3�ROhH>����_�2%�G�����)
�C�L�q_}����Z:��S!��M��J����V�LB���w��*,B�\%o���jN��HpC����f�~%V�7�\���I1i�G��+2r�� V�-�*��8\�kE �t/�|���!t�.������%Y:���9;B�5)J�n3CH�"�{q���+�.jg0_i������UF���2�t<�����+��x�yZ���R�8l�*2���/���;O�,.�����vx���hr]�J��9��Y������'�]z��F3�T�=�N����"�(�N�X��R�\x6$�\�1'L�8
�|�i�~m]��ZSvi*�W�yR����r�����8����r��n��7T%pi�f�F��#����}�u��u�+�������%R���3�������,,<��|zC���	'bG��-���Bn���P��1$i6�������g5�(��D��'���0�" �P� K��y���,�
�����?o��F��[Q|'��t|�jV�����r���xq<���R�+Y".�tz!ROC6��L�ig�H�����ru)�99~2Ag���S�_Y��5��)���W���%�t�07��e9�}C��f�s�Q���_�onl�\N,�*+8���do�����[JIV+=\���������W>�&C����Q&fX��	S���@�u���"��z��������H�%n��h��z*���x�TH��N`��Sn\/5S��lT��y}��B������8�Q�F�f��Y��{�s���}a�����G���m��{*C%�!��$����F�q�lU	=���6]SKuB����OZ�+f�_���Z��DC�1TZ�6��y�lQ
�O��RZp��}*t�*)�w��������������������K*�z.%�8�|) N2�����`�)��_������Y/�t�����C5p*k��f23�$������H���N��56�%/����Z���9`~IxE�47G�}`l�!Q[�4�%�YM�L�I����q���������Qh�kG�oR�^m\�^U�+�f�J�h��j��@~=x���2F.D��k��*�;S���q����t���
��mx�lu�>��|2���Mx��Y�����(����6$M��uY���L��#���a����o�k�*�nEZ�:�98�;8�.Fe�OK�]S^ ���JZ�������`Y'�`�][�L�`��~[����L�!��5�wzj��R�DXF9���p�8oE�Tbr�.npBz\^�Z��� 3���SN�9�}�������h���-G2�Y0�
(�����=���Z.����Q��*�@���d����2��������3�#N�&�w����J�b-X��JEvF���
����)���p�C�9�L}��CRk��Ds"ZF>zW�4����"�`��*$�kBc?LM2��:�J��fnkRK����.�������x�=E7��Z����������u*��I��k��(c<�h%{O����n��v����t������y	��u��D���7]S�X�U��A���T�11|��AFgR�i���G����C.ek}j��������)����������Y
��R�g�(QV�N���qV�o�V�f�8s�"Q��~){kO��������k����`�fL����b�;��9=�y���X��Q<�>:��F�T~������X��6�@�
>z�Yw�vV)���F���y�r�Y��{���"_��������?T�Z�����P�0�h[��V��n�9��
�����-X)�[�R�ww���.�yj�����m���oA��b��00��+�J��@1U����yj�9����5�������U�#v��E5x����,h�	�r~���[=�T�u��Z,z~x�y��[���S?���{�^����}������U�E*�Z���~?;���N
�����[���-</����Ez��y��_�0qeu��;L����`�"���6�����������gqY�E09��h�k~]Z*����?�<r�	zT�N|(�8D�pr���H4d�~^���>����zU�
�n��1���L����Hk��#�������C���j�U�����fh0�0wy��;��a4�������|�{�V/sv������j����!�0&`�)��$q�p\�at�-���
b�GK�Q�-��=1���{E�c�oT�s�_�=�����a��6�������|���������J�Y2�S�[+g�U�0 R�A/(f��"���PE�mQ���\e��SI0FT����S��S����}�S4�q,.n4��"A���8�y���E�����Aa����S�\�I�vhr���
(9���_pw
�wi/x���R?S��v�������$��'�����\h�c��ai$�4�4!}h/�D*5f�&�<`@�D�dv�$�-bV�B�7��,
O��NM�[p���S�`yd>fQL3P�����\��k�8�*��TT|%oX�ctu{��N�&�5u��W@8�f�e���G��A��l-C��I'K�AVY41�����/�>e=���i	R%��x{��U ���H^��@E�#b�d@��i`��(�sy��'P�7L����`��i�� <cM&���]p��,��(��^Yr-{��F��?��o���md9}�� <P-�8������?�;���B����L�swS����.����z����
������$�Bp���\�]�E	�;���1O���%/�i������@w��I����2��~��2����f~�)���c3Y���h�/d\^�����i���s)�%��S�ahc�G�r(��a���g�!�/��K��Q��s�%��|��d�vm�����l�����������`�Al�X�y���|��V���/����}�V����������
�a@:#<�O����g�e� ��zW(�7��3�
���}��aK���p�S�~�������Iys����
v����J:/WR��61���N(i#����������^�%Up������#������2�����d�B^*o(U�6�#��X�]��"�nU��������g��,����~{��=������3p0I�Ugv�����, ���,n�+�	�]�jU��`	�X����C���tq���j�������e������2��K2GR�����sU&w0bo�`{�����T�;wYU
�<��n�4i-��hi�VHL�B���b�bC9��[7�;0���d�_�����u����YU�)B��m=d�A!�g��("
�lH�=��W5!���J�h���Q�@q� "������c1��������Z�v�Qk�J�~�^;�Z�4t1�4t��c&����A��]����l��������s�!I
�J~0�wx�IJg�����J������Z��)�!�S��Di�h�p8O\�(Y���@��p���yV9�Z�_����V�T����j�>�f��
0M�5��P2��
��2��p��=����;���������o�������k$���D�������~/�?t^v��$�:ow�[��n"!v��`Y((k�2������	�zT��A'�uGA���{�?|�8<x��������G��o�'�'�O�}������W�,��@��_�pFd r���I�����35�]����Ou�
hW��v�l�6���1kLRI�u���,V�,�^��O�U'�w�-\�(\�w�%��=������*�O����L����qi���be���/�����+&�[�QJ�@��h0�[�r��Y5j�j�f����b����+��g���� v9�5�<���Sf
�r�+V��
�>a��\��5�p�K��Y��oY'�Z��j��\�t�%^�����{K6����A�p�3a��.pPp}�km"�NU�<{m����/m+�(��V�Xk����B�/�z�_�����k;U���b�R�AT6:qp�d����v�!��C���Y���D��PobB��@87������*D��g$�e�=c�T�$��'�9���`���^i�U��q
ZK�3���?�<6d�����R�-�����Wkt$_���3���)P4���tn����	��s��[�5�ZR����[TvW
�e5��Zd�M��yz}�:�l$F|����%B��O�����;z:#��i%�x���2����J�Q��[.��J���J��j4��r���r�G�o^�N�>�g�%G<�o�G��v���E���S(���;��?���K%���*��z��R*�I.!�H�J��V�
'�z��\d���J�e1�F��u�oEQ.�|����C&�y�c*��.��Dk��*vu�mt����D����g
��:&��dZ��+�M������d�n���htMOYV%�-������p���2p$X�������.������L�k~���+��z�����O4/��j�8m��*�e����������`�03�^;aT�Xc�F��������ST�����p2��v����^�i|?B}d�s:�Y����[5E������O�������N)��J���������^g�����d�R�um��yU�o��G4R��	�qU��A��DMW��=%
��+������d����*#�Ji&CXa��o��7��#���a�z�w��*�-kP��wdH
c������{��<�6j�Q�n/o���8�!��X���"~����p+o�:��Y��/w�^����3>�r�.�+6{{�<�<��U�������a��*m�S��S~������x[�������D)c���C8���A4����l��#kG����������iU1����p���:'��N=��l�A��E)�r
X-Lx=+���3�����"%k�nFH�C5yv^:?%\]Ek]���FZ��]`��Us�]�$��
���l�(Y�\����7���8z6����;����K��T"���$4,��.+�F��F���w�+V�urY��$�<?=N�'�=3�������3�\��,�U��D�(6�"�n(��C�
����a><���:E��+~��oo������f������Pa�	k`g��d���.Vjm�{�XJ�A3v���)���&�FH�]��a`g�PwWw)��WCu�����.p��������+�o�v�?�
�e>'Sg@���]8����C1wL{�O�������Ez�h��j�N�S	'�f���q�n�����s\F������^Ci\f��F"�#yR�������@6�#L>�Dw��>�������Of������8�1#�i��������`
��03����0���;����Dv'���w���O�*a%���?�z�M�6���M��	��G8��C�����m�"�BGe�,���1��i7g�����j��W����9�����d�5^�g�������t9$?��m����C�Z3�j���$���A�p�S��n�nl�[�47[��m�����K���0C0<�vE��0Q���:5`���q��Cv��K)�@E��$-�lz�Q�C���z����o���>��rn�z�G�g}Fx<��9i���~��Eh�S]p�������>�i�F�d�E�����?�����>l�x�/����w��a������g��.�S��@Q��id�U��^�+�������3^���Db���������`Z�����[*���P������I�<q>��PB7\�M�����������-�E\�oqRP~W%9�m�����;&y��=%���������n{Bn��[��7�%�]3r1������<��=����g��Rg�8`6��da�s�Q��������Z�����ah��J,z��>�&IU$�D��TN���<��\d����a���W����n{�8/�l��9J++
��TC`�
\L��c mm{*�)�O.D�o�N��������|N���y�*i�]aa^�<7�F�C��Q�Wc���}a�j�gd���v�L�5;~*a�s6��i���:f�d�Ow��I������4��j���X�� ��_�O�U����l?�/����F�:�0��awc�S�+��v���4-���S�f��(�p�B�i�&v��m��f�!�jfg(O�f�2l��S��EO)����:��Ri�*ke��l���U��;����s���Id��z1�<~P�+�:^b��w9LB�S��!��x�cz���=+�RaM�,�tW�_�E�l�����H!��Ev�)���jy�a'/��O��|2��o���y��-q�������o`���]��1�)�LN�z��<_V���lnx!x�E,�ZU0:�������P�o�����0ig[��Q��p�*�*�a�����X�*S�O�����wqtVE0cPd�'v]Q���R���D�a3���
����X����c�?To
�5+l�����+Y�|��
\����p�b�d�GYcc�-�g7�
�����A�q4����N=T����:m2�0�_����S�g����84~n�,1u�bC$��}<�m���2<��
��h����T&AnAV,�r����k3��p�/���!�2�������N��*>��{����L�:�����O�]~��T��@���XP=����]��=�!�bYT/����I�b�����0��n��Q�^�c��bZ�u������KS�"�ZX����a=8��p4�J�{�6�S�8�����D��K�Au������h�@��*���%�k������66����k����XP3��]��z�����!p�Q,Vr��9(;�bC��i�p��O�2�5�1�a��L��{�~)��f#�U��E��+Gh�����3���T�����52N�����"�E�G@���������+t��C4������<��\�Q����-�JF�:R�p�C���N����
e���{��{���f��_�ER�?Aq�����K�B@�������>��\�q�X�������,���0�/\���?���q�5�C<1~lM����2��l��q7�"OR���v�^��+���9fA��*�����}7t��)u��-Vgv6GI)kr��
����5��5��2Q<1�Ab�l�1Q#��wt�;S�����YL���������U�87���� ���ed�wgdp��,��M�p����);/!��}Y����gb�A-]�s�DC=r��p.��5Tk�r6\������j��,�&5�p���8t��|��-XA��&�o���vr�\_ "��1���[���1��	��@0i��	ZfsH�]���5�Q\�7��y�^�A�6~�-��|��J��Ea)'��'��t
X �T����O���t*�6���� 
�����P�]=�t��8��z�E��RY�G�,7.uV���@��_�w���;�����vzX����Z�v$�Xc��h"-�����P��n�Zm��"����r�X�W��%��;����h7����T��O�W����E�B%1%��S9�$b�=�8ShO�jjFJ\����=����R!���n	�F
`�4�������3��%�x��t����@R��0���/��?a���x����^�==����H��]2:ae�f�Q|m((V>c�3��%�}K�i�^����i9^Z���o|��������#6`JI*�x��[��q�F%_��tQ#��{J-����"���5S8�?�	c��s�g5 �#�t��zOC)�Q:�N0�U���	����\�,��\0�����\:���������������Y}�hB���uo���SE�I0b�)�M���������G�U��o%���p�-O5������R8��Y���6^S�}\�{�W����_�|�J��������H
Dm
�b
��E�\����$}|�Z4���h�F��Q�9�<f��0��g�.>nD��G�@�Y�WX*&W?j+�P$�h't�H�<����to�QC����~��S����'�;�oO����	4K�4��s�;^w���{�����k�L�����^Bq�Q�����tvYR5�R��P-L��\�~�����'�G��3��M<4_OHk��,wi���i�����
�,�,&�n6�g�E�lU�k�0������r<�gs��{q��bo�q�$�l�?���k�TGB{	;���_]DD;���+�����x*���Jh
7��[��\Ca�CT;����q,�E
/Q���V2)br=�$1\{�Rm��U�p\$LR�?�������
GM��MR��<�b`0\K���8 ���FtfS�%�%l�[���)Mj�T*z�Sq,��78����L���]�=mwiPR�X���$��>��o���h���Yf�3�����������77+�T
��P&��
���LkR����-i	S�[ �!#��#��i1t��*�Rr�7* �5��N�AOI�V3
����7JO�XU����N�X��@ne�O�%��bH�Oza�����\��F��i�i�?����UI�nr���s�!L����[,��%���blT��d�������v��J�����D�H���Ub������L��<x9�LMu��4�
�����C���#��
����m�Xs�M��2.��2pQ���`�F�6�e�p0h�k`P���%%L��(/4���Z�.�x��� N��*$�Sd��e��BR��D&�y��?H�J��RJ��Z����[!L�%!w�<���]y��%)��$U�C���xL����r�)(N/���=���'
�r��E�������hJ�1V�a�=���W�0���c%�8!����nf	�jUI5??/�r���NG�Jz��\'(� $P*�+Y~�����jp�{c��EzV��SK
cy�7�P?,Q�����f��%I�M��� ��Z��9KO��_)�u�hf�������6hH��&4��IN���Ns�j2�M����i��������[bd��������Nv�^pJ�X�x^��Ea�R��x�.Q�xW�$)�&���x�^��P�p6�o���LyVK��j�f�d��K��,���NRF���O(#$A�e�fI�2�6���z�N�_��|�Zd0%����pc;����/���?L�k�����B)	X��dV���')������20S��c��r���s��O�����ff����Y���Y�<),���g%�ue3G�FY7�w��� D8C���t��U�taBm{6%H��v	�M4�6�M�L�e������D�EW"XQo��&���
�|�O1{�t������=7��yg����u�ll+�����s��>w���g���lQ �2������d��������
��������f�4-�SP~9Rl�������I��
t��M��M�w7�*��������1�������j1���l�����^�g��-�f���-�xi����d�����Ti��T����U��&������.>���Pq$:���	ph��c��z0��`\�>��^��1��]����Rc�5�&-2���%�`��z����7�[P���� �)���x*�j��4��g2��v�����������y�)�$U{�f�����|�{�q2�`{3O0l��n�c�{�����/���A�(&��E�Q-z �w��)��%����b�JF/���1<(�h������1�Y�6��b.@,(k/��p�v��k���Y;���:V���R�(�c����$�F���s�8�eMt����&�cPi�0�%_��_T<Uj}�r����G�V��bW�m��~(,��<��-��!C�;cJ���Y��6�0V����)-�6S��:��&1M��4)T4{���T!2^�p���c���]~�'-S���}����������/"2��_�<Z4�0�Y]�gM��qG��?����(�G:!�.:H��| ~���H��|��:��R�ud����(�� o������u�}fu�xl��Jf:�m�L/1�
��]�Y�u�:��A�_�A����hz��4���WX@
��d�������e��+���*^��r������Cr����|.4���0�����p��`Ye~���3�0��+;��U��/��w�`ZZ���0zZw�pi<2�ssR�jk�r�RGbv�����2�������Qv�9Z�M��1K+'9��7]�h��2��W�e"�h��d�v��t��7		��W�U���:��]�k�d�\%%��tg�X�6�@-�c�s �������]���L������h�V:��
0��J�lN3����hz~�������v1���c���&r(��`r�B�XU!@�����K����!3Z4:��>��]Ru�ia�|��,�~��S,s����9l.MK�tT�m�Y�a����Z��9�`�c�j6F�J��I����8�^��}gDe��,�
eq������]�s�.�>���X�ynr=c��r�m�
n`E�F3��w4���ur�������unU�Z��V�X�*Wl��1��Il�
�
��){��,sv�3�rZ������13!n��6D�����E��R��pR�co��?<���(��QP�����2d��������com}8
k�����v������{�����^Rj�Z�<�5�')�fG��
�8s����c��0a�q����;���]��j��_	���2d66���vnla��.2�3�a"JY)8BJ\�(�����C�8�}jT��N{I�����I�*��j���&%i�gAy�NB�N���4�wo8�G�?Wt����2�J�Pb�����MI���E�R���;Y�����x�������?��F���$��F���Tc����������_�A��������~�n
}T�����Ke�Ow+X�I��������k�mdG��V�c�(;-M����a?����N<�Ir	,3�!G7`9Y���t��7��&��)c�W�����K���'/p���A�\�qd��{y�~�)��w�;oO���_����������z#��sI���>��$�,wJ���	�>1�|\/�3��
4���%�W����6}/����:s��|�Ph&��e0B�yko����HzD�P&�Y�D���Z��S23%�m����)I`���t�?����S"m����r������ko�6������+<Y�	����P��I@�c
�05]L+�S�Anh:vr�V�8r��N�&���O 3j�<�`�,����T��;�����l)����@�Yr+��!MCH[D.�������6U���i�t-����x��9���#&c��o<���h��A����
1TLk;�;g�9zE�x�������
����=�����@�4�zy�v���ur*r��2�8!{��������i�{
����93�s�:����~��u���i����?�<��p�����]��������g8�+��w��k�$�Rgt�e����L�������8*+��l/�
)���H���<��S��sT������
s5�������/��1�?j�����g�'����RwXq�D�dBt���;lU��sk,�2�,�������t,���R�������vP�"�(v��P�n��`�I�8\q��`�f�]���c������C�!��0�z7'&���\ ���>
�d��A��Q�b4�5�����np�h4�b�v�������� �d�Q�����.m�h[1�7�i�B�����a��v0�,y����dT��
�n�KX�X8,e>yqi�{�����+�-A8f�<���|�3�n,�!D���v���Ha�5��-6���D��o�Q�s�}�4���U�P�����6-�:4CK�c�!" ����,���YZ`����M^$�?�ba��I!��f,�M�
���RI����R'�����T��b	�7@��JV��4�|cm�s%�����P;s�3*0%���������������V
��#��!��4�,�No�W��0%�K��J�����A��������Y7h-@�w�bA7��g��)���~�U�p��t���xy�$�\"
�iH������s�����0x����R�4	�����_��`�o�K������(u�)��|
q��������������u��E�5M�*��(���-k,�<&1&a�8NX?<\rJ8"
�|o�&�v2�����/�����y'/��dK
�S���U$�����w��G�������w9�9O^���
[nqh#��vvX�\�}Gqf��S��nW��$;�:'�Nm���f&;S���K\X�'^�7��9I��Y�����36T�m!8T�9w�r�����3]1f^^9q���y��!��Qt~�����M�����V9���8qS�����
 �u/r�wT��F�����'����������0����%���(�g�U$)�Y=��[sN���M�������)�����y�ebXy�bX7)X�F���n�8�'O"8xeb[5���d��F�'���7�,��%��q-]
�z%r,�Z���Y��������.��{����t������:0�.�X����`�!V�^Q�p&���H�����$]�	�QCkX3�z��qi��T"W�x�.���kj��|9O
f 0��Z��uF��!9�k�&q�
Z�}z?dD*��[+8��r%`�%�a�I��kK?'|4�&H�����g��vW3S�6k��Tp���i �z��H�w^��"{�b���fk�f������K9��F_tBL)0q(	1�Fao]m���k��[��^���H�={{RjM���SR�
��3l�Q��-���J���^�C	5
��^�����	q^��-^e�'++BoQoj8$����&������#�z�!�������}k��e�CS����L������Vy�_L(k�����}���i*`�G)|�RR���
H��)'[_���?��G�Ij��v�Az���������1�������D�r�	����pW+���0�������-|����]��v�Q���C��n��@�K��O^�_�X��J�����$qT�����Q�NG�<�24���7���O��]�
��8�������J�*4�`2���E�A�T>Z���C���4mA�����/[V�`~���1�_�09�
�KXen2��,#���R	i_��Ji����Z�`O$�Wx!�A�4����L����(5�2��9��� �i��(�z��]�P�x���s���I��wR�9�k�v�V>��g�Y���#bn�E�k�e�]�Y���q�9�xMaU�������OT��|�3��o�%�'\	���p�,#�0���u��9G��������x�Bb��1���&o�n��Hx��~q�k�;K�#\XR6U1�����g1	�����[��p��mV�����k��Q{/-R� 0���A���~-k�>�_���\"F��{�s����x��3�
x�Jj���H��.H�	�c���3�j�mR|7�&�v��������=4�h��1@�..���������� f��
�|���/���`g�v7?,�� }?����.�	�,��$SN��qo:ZW��F�7Q,��jG��`g����(��V}>���_��,*L������^�h��[�"B���X�
�����U�l��!.���Q/x��m�1��,����o����"{;s�d�y���g��\�`���# t�5P>_���br��mm�j�n�'�(�}�QF����?w^�uN_�����(I--����X���������9H��5�l�����"�0rf���uJ��8�rW��Q���r���]f���;��`�>.()�K��*6�#���#�!	0Y���K�'������Y�R��P�Tg�hKvM'AA����C���M�K��sdJ6	�+�6�������eK���7FA��OjW��u~�?>Z�������rRS�ir���0�����rJ�f>{����G�/+��������2�@����G�&�l�����rja�xr�������Z\����K2��1;2���-�01QV��E#C�*��#d�K��cr��K��������ZC���J3�LZ
�N�>#�eMtC����$���@b�����a���:��4_)8���R�Z3=��'�bo�)0�f�n�����f���cBW�C��2@����e�D��$}��.��~O(����a��,~�P7�]�1�J1o�6��s������Ap�)ySE
*��1���
p�p?���T>�������4����G�GA���dMJ\���#,3�Mc��F��<2.*D\�����9\�"BJ�HE/(x���Q-�>F��a����]1d�^0�5�1��+�'�b9����'TQ�[���TY��H
.�]��� '��X��V�a�����b0E8����TI�P���9p���������{��i-�0�}e��,�����-����dM��|��m���{I����t����a��q8�D����g\���1qi�c�n�T�c��������pF�Rg"`�V�V�����U��^�������Ye��D/�q��0�v��)��+��X��	LH<!s�tl��D�����D��m�@EN�'���Y���]J^i���2�0{N�Vr%�]@AE�����GF�!���N�H��w^��>�PS�4����;����sxt�����O���~��JU:�,?�e�	[%����l���������o�9�8�����h�8#��]����	7�|x��;��.���
�(Ya�l~�)p�"9���D�$���z$�C�Y (��<�\��%Y�r���H *B&:���f2��a@�����W���R�U~��J6�!-1�|�(����BT6��(�B�����3[�l�b1�N)���y"��(w���G��������,�K��B��qa���V#���R���������)��b��,S~3��J�<�����_z�o#�R�U*�b��V9�$7�R�V�����gL��,�G�BRg����S�����|��:(���$�3{"���$-��-��c������2���?�tu��x�k�b�^3���AK)k0S]��Z1H����$)pf~2�h|�$��/q)�Z����>*���S��������gk�$k��Q���Cf"��FH%)s�bb������G�^#V�Y�4��x�����Kx��bS������[��}N�Qs��� ��I�C��a���`�h���������������=�6	w����3���V����)����t2��G8�\�E5*@�R
Q��$���\�IX0�m���1����7Q�m��z��MIp� !�����3Az�'/�c�������O#�TZ������&�SY@�]�v^�rp������+[�`���:WKNw6�;M{Pm���F���j�R��Li:$����x�T�����_U�����e?&�.��xY��A��g4x��{�n+s�x����(���_�WT��a��Y*�F8��!���u{Ee<����:��7w�����������5��z}+���W �����x���7�`�����0��Z<_�{���.���E/��P���U�N����D����Sn�{&�Q�=W��)��/�=	�+�����]������J����:�=a;�AE����ObQ�!N�EmG�Q���v����`8�IC�h��O9O\�(ua"/r/+� ��%����7����x�I��gB��u�.�C8�� ����&jR��h�;�����"A�����]��=P�V�c[2w���c��!o)���D���d�����tzt�c?$�������l�I������oR&����=� �����2��7&'��9o�X�'������&e[B��R������	��O����1���s�2��!034}���a	�HJN- q�L������k�H�!a�H{�|������&���n3���G$�pg0X:�C���i,���<�u�r�y���VwXj,��'&S���49<�/Q���0��Z������
�{�ZD�#� ��K}w��4(J�������L��sv��z+�����ddONcn
wA����0��=�m��L����p^��s����]�\O�����,P�ehm�y��|�����<�FM�P�^.����[�b�\�B��`���`�=O�:��y�sAc"�;�s~�\%h�0��_?|�F0}����7������F'��C�1]W��l���ap��_P�%I�����U�D���~��-��5����W3O��&Q�r��:r6r�N�Zl����F���D���>���s�C�!���)�s��������?/j:V�N��8�e\)&����������.���d�I���2�m6 �`Q��������X��n�&J	*��y.��ep*5�����n�T��p����;��v�M��na)�����U�m��6�kA^�r�yc	�O��IM?���V��'���i%�g�E�T[�f�.���|n���x��zZ���������'�\���S�>���vf-�R�0I�����U�Gj����/'yN3�(����$I�qq��5�[5���B��j�+R�\w���K�m7���<%�ZD�L��6��m#�_`���#L7�m�X}�a�Q���@ds��5������WbT +�/b�(3�O<�Z����g�� .z����BH�]��?�c><x��C��V��*���N$?B��x1�&��jBc?�\6����1&Qw�)'�����<�?�P�d|k��L�d���N�d}_��&@�g�pEk�o4��WX)�B�jL�U����+����]`���"�'��g�0fo*iz���R���V@��N��������}Y+]��"1��T���������~/���?����k�8�3���C�<�t
k��S(Jb��l9����i���A�� w�����y@BM7�p8JB>��#�h�=]2D�^�a�O�f"��A�����;�K�,*���`:Gl�����kw��_*�Z[��VcQ��i��f����p�;(!T��&�[�Ul�#�3Oz#����bM*{��c�TJ�2�<N{v����q:�,��i�z'9,Q����f����N��y��jZ�y�h���@��_���ip)`�%%z������z�$2�((6�e�{��T����0X.++���P5��0^1w�v0c{�i �s	�x')�l������)G�1Sa������eK�hR�r1��1DIIRe���_N��2�>����f�7w�����zM��X���%����*��^�=+=*��ig8�<�|8��3()��ap4�8�|i}r�l��[����s.M������G|V]4��	b����!@�"�
�9#��fk��S�Wj���2��(��<K,MC������qp>�c\Xf{:��x�S��t��������/�+�r�n�E�J�����?����z�d�U�b��L\��+�@�����+A�����K�sF����h:C
bN,����#Q/m�Y�fqKTKR�����;��+��\�S���C�h:�������:����C�a��h7��`�8�a��O!r`�G�R<��^�F�q,U�@N�l5F��I`y���Q���+u���i���?piuB����B��e��f��Q�t_����������a�>wV�����Bn�`���Q�B��Q�M/��7��K�_�^.V�
�3�Y��w���9��3�yvB���I��;��eZ� 75VT$�A��'���7���a�����f�Ru��8���JG(r���9��N���3���o�$��]i�-7��5�u�u_�:;M����cw0�-F�<yk����y��w0f��8!�+.��I�]�D=
�����SU�@"E�#��1��4�|Fs��t8�G�s��5H���0�?�(x3@���\[g�%�,��B���#�e��?��E�#�Z��m��e�Zm���@I�M"&�9�����$��mg���36GV+�b����#� \�����i�Y��k"�Q��YFZ'������ab�4���i�u�U��+Q��b��"�].Q���i����k�q7���.h��3lR��1��h)��-i�6�
���?����8T��b�j)7��V�j�&hc��������J C�����3����S�}�-�����t.��.��)~��.as������.���]��yK�����M|fu��'p�c-������Tb�����{��h�[Cx5	)1-��6�m��U�5������I��JF�������_b~���5%�����\�!�f��l$��Q�s�DL
�C�\��X�����v�A�/�G�bt�G6���b��9)'����d��l���ChL�����`r��������NPE.Vc��P��QR�G���n�c��6{_�6�rtF�������:L��9mKR��9�fK���")��H$������o��ON��;{��_���soB�6p~V$�_�a/B�f%.:4�+���{;�n����jo�q�W4���I��c-��%��`��2���st�������}�����������S�Ma����	���Y��M�M�GVrA�Hb�
k/v�a'�%o'V��B���P��k�Z+���/�$k�!Y�V)c��J>��J!���oE�~<)p;��:�����"�x-�D��p4�d]EL�EY������~����S���2��N�����������1,�O:��gYO����:,���5��1����^���w���C���|:��J�X����6��Qi�8�0)X��������>�c�X�]��!1-�<��c� \��6������6�����`��A_g\��RzK�)�qW�FL^8��{�XSd�Q-��W����\�",����TZ�]S��}�.PQ�J���n�e,��d\"G�)���"�����\t�3���mI:D�)'�\��41��>�������t�.�~���I�s e��������F�^l��,�J����\fa'�����a"��Ei��hn�^���V��u�8]c4cz��G�B��nK�����m�����!2�|���P��A���
���?��iZ�b�e��g���<n3���,�SE���T�M������!����k�^=;)	��j����H_�k��5G�L��o�f�QlnU����l�>��to�+5Xg���+����q0����/�p�:�<�p�vd�G|��9�����C<w��4������_���	V2f�p7�Un[��9k_s��O���B�9.���J���U�E0��dgx{���99�?>������2�8��pN�*���C*����3_G���egg����(�nZ=9�����1��/L��5<��O93��V��R�@�P�����2���v�U�2�B�����$R2�S����LI	
YFFP������������QR�	<�����z�1���&k�[���p��\��y��#m����������7���������� f'���j:�^���������C���)��e�Lp\��}�z�z���|ld4��`�8@F�g��?�����ir��A�����{�����/���b9s���v�Q)��I�O��g���uj'2�z3yt�f2w�����]#��.��_���<�V
b*l�N��S������.@��s�e�1�)��K���>���p�"\�T����8wK�$��C��rF�����Q���8D'�].��ZH�����������?�D�0W#�����Z�����p^����3�<w����`���V���l���1sr�n��3d��dk�ts�	�����'q��/��k���J>��a�q3?a��x��#7<�{��Ae�z�L.�P7����!^4��sT��a�#�p8�O���#}FV"�3�c��
���������Nq��p\ty�R����n�d�]C��55���@����D��{
2�"SlxfA�y�WR~��e��6�ZI��
��+Vk��m�����XL�*� ��t}J�0�Dj�\���*��:����tQ������������A����s�e��N����'���(j�x��&S���cCS��gRG�
'���8�Xe�Vg�-L�*e�M�v�J&������%d|��x�_����!�)M�)yd�e�!�,��^��8�,�@���
�zS�~0��f4�:QM��Xj?4�@�M���F0���d���`�'+�rMHt���;e/�
W�~��$�����	�����#`"��N>s�J=��8u�
cP�
����l��
���B�����i���{��nu�u�� ����d�s��y�n���)g"i5�b!��![MX�q��G�f�[�������gh�(Wg���J5�C��&E.�E���z���KD�Z��{+^x�b�Qk�L+�F���#G���al5+�~�����f�8�iJ)��d-g]U*(�O5�nR���l�o8c� ��+y�z^��2g���B�9�x�l=�j��f��?+��;\J6�O� h���'�����i?�e�/S��V�F��i��P���T��.�~�>�E��d
���k����h�-�A�z�z�$�3Pz�D�oc�Y
��X�Z
X����2��u7��i�F�#���']���:�)�������7;�/������v��Dcb�u0a��.��$ �a��8 (�LxBT�3�����3���\e<gz�����E�so���,	�~����7���p�8�
����,�}g&J%�Qb:�=Z��7:��&9g�!'v+r=faF[e��n�l�i��Mec]�e�c1_���	"d���$�gu_���������l�d��l��GkV����}�8��J
p!%x9�Uq\4��Re�xRE�e4������+��d�E:'*��Y��y�����U�:&W�

����J!�%~�O��{R�(�(����]���m.a�@��L�bxU��J~���d��[Z�R�em���RP�;&_[�X=E}8���?�5�)��|����2�O\�IQy���h1&C���R�	I��Lh3�I�P�^���i|!�Itw���9l>3}������G	���R��	*���O'�%~���������A�p��dd�p{v<M7�`�.����N�n�;�x�EV|�B��s66'�Y!b�BD�	k@&�*��=�)c���}���`d�����+�/����;�o.z`��D�K��PL�l*�x~3u�gU/�X��*���|$��F��I�g�9u�G��}�C��L�U�D�IZ+	�#��������1H#����<7�r����^�: ��6r&����;����C�����t�r�[$�����JY.�F�+��}�GmR?����|��&����:Vd�7���-�~W�dqTZ^==]��f��pU���2��%������iz��a�!�;
��[���RP��I�2�����9�D��O�<�X;_�1oWs���,f#��������}A���9�ms�NTFE�[�u�,E�l=��ee���q��q��:1��{���b��*�{�*2���J�=.����p��SZ��P���#�7��>r�$io�����k���������5�f�Z4+���h��Y!'8�	�b���eC������PzT�ob�0��w��0�D�o��:��wg3,���L���y��u��}�	E���My,x�!
�N����>r�=Q:�
	���~f1�3���,^'�T���6/4'�L���D������#c2�s�!�9�������c��q�c��u��y'\��C(������r
:��x�����B�{j�GS���wBVJ0���������"o��'	������YW���2������29*y�@'/&�.�(O_�Uj�1�/(y���}p�\���n������B����u<
���!G������(�h���X.���fv�����r�.`D/�/v�g�r���"8r0��D����$���T��+�IWH4�>�������c����A7��P\z\Tyz�9��2\�\u"g����3i�6GR���q�B����d���h�+!��2s�I��	�{4@�Q�����z���l���|��x��P���^v�@y?w_����\9�%j�%�!5Z�d�m���RW�	{A����l���%���:.��/Xc��3L��(�zyV�2������[3��M$���f�by��` 7	���5rQ�N����1����9���M
g�T�79��?�>�GCv-*�5����[%F�oT�=�uK�D���&����o������G�![[��$�_��#�bu��`�����8�f*?pe[�?���D��	I���l{8QV���(���R,5c��
'P*�Gp3[�5��:'56v�]�����p(x�X#����t��K7��jj�����:�%24\����=H�	��h>�[2�.��[���O����)���)o.�=�����+��e�e���I�dN{F��u�z���)������u��6���i%l�����
�����=�9���O=��R�}7�;�-�������I'A�61u9�}3_igA�4��\�@M`wh��x���aI�l��Qy���e���Q�3]
�d���4���r�s��m���W8o
�T��*������L��D��C
���x�3�>z��"���9��
z*�qv�p���j(8�?q�}���H�����i��b&��r�����K&�y�h�s($�VfRQ��TPE��VNR�j`F�Y|����$���v��&�0���������wJ����dw��=ZQyB?Rm��SL[�j��)7�v�X��]).X��{p�{���8S:*���B
X����{a��i�G�E�[?����(���y5��2!N�dK%�y122�S����N�ic���L�T�-k�
	���Dg<9�9F��8"���i�/G�/���?��z�{�e�	�o:�u�?�| �9�~tX�����p�|����Q� 6o���C�s7	-���p���<�H���\��FU��i^r3�@�e2�
����h�	��a77��$�J�YI�(3���OJ�Y��o��f��4	�&)5kU��������=���������DW�
�6g�����r����c"��������gL����/���,E�v7�D=���BUku�E����>�6��%n���&e�\�Um�9�z��qU����hMh�9�VBS��Z��F����\e7��`f��u�!�:g�#_���5�w���*�������y��(w7�ht�����lm�PV�R�;���^H��]mt��Z��n��v������z�<���-���/b�`	:UeDY��.��L���3	TjN����u3j�m����}mf�L�
$�����2*�e�wJ��y3��9��(���OD|�-�5��L��g�n�R�R�Z��m��3c�q�98v��s1�Q+Vm��-<h6�c�W�0�on�;�U�V���J[�V��(��*���O�>�I�j��b�Y���o�~4
��'/���Fq����VL~sQ<$�����5T�HE�H�;�h:��(�h�������yo^�ip�[��vk4���-iV��J/U�X2�z�qMc*���(�.�.�	
v�qHLY�����r�����8��49=y>����$�g��,V���c����#�c^�Yy6
�p�:�lU�����R^��Rs:U����$���'�w�F��P)�f�<����I����z;�y'	��c&�k���s�O'A��z`�uH�����F�8����{cW0�����U���e�@�q����c��Z�A/��zh�r�� ��:� �F����\�0~���U��e����Y�
$o�����_0p4.z�p�c��ut�����Tz#��,��$��Iu?�����Z����z��V��	�b�
$m��������zg
t 	�l�����z��k@�I���N�7�-�{��8g}tN9���#1����d��W��l�m�I��^HT1G'�b'��)R��a?B��l��z+�k8��<���������|��������'������L)k��h�A��h����X�o���S�oW��^��Re�;o:<zwh�
`;NQ"%���m8�q?�|M oXo��@�kA���E3�+Zm���2������P���p�:��~	\���=x��SQZ_D����mx>�p����J���!�'�z�@��~��kmj�����Mk���V�3ft5�9-�����5����*��T��������py���A=�:��P����������/�������������������?\�����_[%Vp�N��m�������l���hH|N��z�����9������?������F�^���z�Z�W��TFS`Xz��.%1������EP*��t��|�#�$`yO�"'A9�I���0Ie�4���-{�������,;��q�L*/G�@n�� �'
�������T	�`j�mB5 ����O�4��Z�����pQ7�V���c8Y��HzE�LD?*[=���c�s��iU�4 �K���7�R�0N����H%�i��o�3�c@c��t"�s1���G:���GRmD�s2s��:��PW~$�/4���c�#~NFrF����bB�\�u9m�J)�B���7���%5��b�������$������A��*-�z���K�����R��w��3�)���7�	�����b�sNd9�!�RtL�)��K�X�4Na���/���OW�������Y.�J�)��{A���1�nX b����u�'��a�6c�'��)i�����JnV[�f�Mu|��Yl:K��V)�j&|�HUF����H�5���p��FW1���NJ�>���������E���)���?fi��+�������%��9r\�_�x��/]L��:%��$`����v4o
�,\�!2�gl���-���N�.
�/-�y������*�D.
ONJ|�t�e'E���1��N�r6���/�!������?��R�x�U*���		�q$���n{�,�w�Q Q)���U�`�����4���,�Y��Z���ZM�������)K����;���'{b>l�)z�U~�m
J�F(�����qd�}4wM���9+��J�,�����=����G�`"?)�����
�7�;��xk'������}N ������
�!	��(
��ZTCX�8�N}��b�B��E��rc�|����0�n�<��k\��h5{;��'��Kz�C�����<�47L�p(���A�i�}��7����Y�:8<�?���+5�7G'�/@���'���o��"�&t�������s�#�����AS"�2R��/�$;��������������{,���R������ry��g,�������V����s�g�9P�v/��0����j.4����1i�1���I��D����^�e=\�Ct��`G tp�v�w�QL.|�IHA�L��,	>�j���ZN'�����~?c�{��Ix�gc�^��Sk-854�!�u�s�BfL�S��	����5�g��n������.�������[LM��9{`��Z���x�u�c�U�8p%� x�������J�����V�0)��c���G{1�~�ss�IvE�J����c��NG7JDS�w�'�L	��z����884��������Wzt�D����j����������W��<{���re�{.�q4�%�������	J|d�".�;�#�P��������;z�'8�W���x��2����k`0;]� :/��xk�������h����-L�j�|��}�V�_�_������C���w�y(����' ����&P���=:�� �9�N���y����_���E�C��rL�o��;S��0+*KT^/l[��s!�aO���N�����a��(�+X'%�H��Jq�av@�
��F��29�>�T�
��,�mx+I#[�U�\3#�Nl�� �s�����"���������5�����1������N����&@ ���2�%�o����;�D���������5@2�����Qv>�1=�/0����^2����)Ll2M�������W��C��d,� �
|t�G��E@Z5�l��)/���p<�8���h/�>L���-��Y�X����P��W�z_��PP�VQu=I�l�vX�	#�����19�I����-P��M$���GA���
i�����"���c��5�Hi��	fq���l8H�}A�Q����dUjFGR���Y*�L����P��O�]�q�
��f���L�o*-fv����f2�0�n(}��
/EG��gQj�MO��f�[S�~��c�O%w�Ra���}]a$�Y`��`1o�@|�4F���R/��PL`�B�V]�L-��t&�d���,���wp]-�pl���v�2��Z�b�m����N�/@
�t?�F�����MkA���|6��gi��k��;�y��/R��������b���)P]����h~�	R�j��jYp�Z�]��������k��2�z?F���5�<o�_�����������ODe��1'�3�s�`�����&�e� ����x]��$p�x�qJ>������!4l=9|��s~�R��KL��X����3�8��|�����2���{+�������m����gz��Iw�z��6��|O��������ON^��=�R��T��6�
>�sf�z/�'�������I�@_��J��T+�4�=\�d��aZ�b�|c�R|(����S'����(��kQ�yL���;N�{�b�c��lX��@.�XH������e�$W��D�1'=����2�����.5��;o>��h)��.����p���HJ���Ka��2���vH������a�R�P���(������p}a��G�`��I& ��pF���H��h;N\����h�Gw�����t	��4��-�E�d.
9�U*����v�N�v�
�&�M����c�K�h���!dw&@���9;e,���2A����r�Xx2�{S��G~7���/BO)oU.��X���q�/������������8�� &��--ep��2&��*`�%z���h�f��|[���r�=z�?����������oo0Fx��x5���!|��v��`�Wo_�������
	�s��
yF�'9���r�����X�T��/^�h������������j�qY�n������l��`�x4=)}#��%N6�dl�m����Im�h�<����rV1���x��m�Ly���z�):���tpm`��h��1�������mK?������< `���L�#(x��G����G]��]>�����E�+�1��e��g�����
��K}!��t�k�%?s'p����p)���D�QJ�6�f~�L6h�G����^����J�������kK\Gx��	��y�+��g
H�K�yn)��@*@�wIq�l��X���y��|�r�Ng|�GyFv�a7\��\OW
�7�X_)��|A���5�C2�������c�I��d�������^���|T�r���2�J�&BF7�5E�F���D-��]h�����'�}�}������C�u�N����������� S���������lc6Q:)��������F\�-���U����z�&~�!dD�"sm��3�SX{��������#1���tGce��=�y�X�:O����W�E�W�R=�'�����o2
6���?b[�U��s42>7��mn��$fl���n�i��o�y���7�vG��.�'��ib���k��F�<���WG'�Y�]�a������J��,1������f:	a��f@7� �7�m���K�p?u�����]f?��L�di���%��������B���E?�����d��;��n�/��(�L.'fv����f0��x��W��`%�LS������6eLpE�e!��������������V"�vN���{����C �i��p�����/��0��z���e��^=~M�o��!�������^w��
�o���+���s�,	�-,��U��L����$}Q��� ��}Q.z!�6)��`�1�=�+�����c�
���Es����=��-�	��&|hm���0lD���(oS����2�!4�eM'�P1$��4�n��)����� �U����_
\��/�eQ�	��V�n�b:h�w�
���#5L�w��T3'����Y`��[t�5T�S�?<��qxt�o�M&+�����d����z�[
�V[���S_g��/��,���u)�����b�\�j���'K�Fg�[�?�=�;8|�Kk4���yG��#PI����e����HC!�~�Y�~2��pc���j���-��B�VO"o������cX����v��~�������K-.c��p}Y����-.z��u�l��=��,��aJR�?K��g��q�9vgk�B/�3h4s�%g��k���u�ru���e��J�P�Y�N�T��`n�e�-�6hy���c��RFky����\8�y���������	�z���S��-U�t��u;���`��P/uJ:����NJ�_�
y�����/�\f\�aM����;�~�'���.���>�u8`�/&L^� I��	"��E�K;W��f���!YU��N��1����`����mALD'�X��a}H�C���"m�)
w�RSl��M.E�G�lY������$����JF������������}!��=>9:����%��8�+�\�0���m��^��{'���� gk�3��F�W	���=F�=����LE�|o]�4��vE��&'��D�S����S���
��wp�V�&��*�
��r����M�ks���YTo�h+�� ���6�N�W��r%M+g��t��7E/����F7���� ��w5��EE�v��iN�/FU��|��x�j�
t�j|�3�<H����x���{���.d
�;3mejo��9��F��$z_�BlQ��d|���8��D[�t�Y[���gu�\K�)^���f���F�xD!��$��&�V�+�J����f'�)�\D�q�A���Ku�0����?;��;{j $?�Gg�\+p������pt��~����Q����tz�r�Z��������K�K�����
s�a�7������3hH���g.��@�'��u�.��;K��S�����z�T�lh��j������8z�N�o	�eP;����yN�9�Tlwp9�\���G�����?�p5�,,��&z�T�2�M;N�%��Q�������*������)e�oT�����0e��o�Pe�T.~��Bf�K�$\����/}�R{t�C����8����G�m�\�[gVW9�k�"�cEi��-xT5��EO�fL	���O"+�f������(����S2�/+�x����@�����
�*��U���%rz�����V)�d-�F���@Z �n5&�a}c���P�s@�O�a�����-���3��9%0K�������O�S�gB���5;1����>J#�2a�z�G@N�4���c+w�88�����?Ro�p���'>��.}�%p��i��4��gN�n4����
<M�}���f���2�N�/�K��fS��[I�o|��Q	�_������KY�%����_t��d
�Y���8��-�>P��3[������\p����m����Z`�Kah��L�-����R��@�_v)\�jk�r�;��W�x����q���j�
�n����U|��q4��
 V���Vhj��������hD�V�����D��������0��F�Z@UOE|����`b���r/&��Qh[2���$Sd����9c��d��N4���dt5���/�;��eT�����qa�H��;?��J
�YE
L��t��<��C��u��BH�-�*�?
C�zbB��`���t�e:�5�����e�>g i�1Jr�����q������d��3%jTp��&��$��
s������9���'�ChGG�`�
���J'�a�js������M6������W\,6	��h:�VDwk:,iN��8������scP��Y��l���/����;]i&���A�o�1�V+c5��u7/
	M�2q�����+s�x�<"'ioywjUL����ca����-��Q�6���-�muL����b["%t��-c�"�cc�)@��{��:��B�+�s����|*16����11��8ti�,��7��+'���'%_v�d��&p��J�����*��D����,<�1�I�/$�E��B����hr��Q:��}��r_�����2W����m�^H��p�`�w���u�����PMO���������5 ��y�(�������F�f�kW��f������u`Ro�������x�?o�o6gb��9h�.��+���=������3��./�3��V��l���V��koo}���"�_3cB��Z(��P���8�w�z����cM����4�:�|d�
�j�('��MxI������+��J��c�7Ak���"��6�0�4�@����*��vR�M������2~�QC���>��k�|6?6Y�vrY�I�����j[��c���I�)M�Y���ng���������^y�TFg��Qu�nQ�F`�p}.�(�4��!�0���o��bR��Gbj5�?h5�s�Y����p�G	��~��_����t%��u#���9>���{{���{�s������@�}+e�	9�W��; v�n!Z����7����R���|������a7����9��IT��U��v-��C�8D���Y���q6����!�ja�#`������b�Z���|�u�i�WYz&���)�6>c\��o�*�V���#\��=����i�7p�X��bV����i�;'�wr���9�^��So" ���gD%���B���8�K��Q)�,����K\�� <�$�x�_-vu�<U����������	=��\�%�naR�P+A��2P�+�!�(�H��1ca�V��,nU��u��|�(��a���j�dv��`���n6.mU��J5�K�3p��@#���e8H�%S�5{�����[
��#�!��_-
����|������-����
,�jT�Z�>?,f��WG&�
?�Cc�}�76�j6jE��5�t�Q 3�-I�1�v���|����(�SL0��[8��#,��Q��"�q�z�������1.�����jo���*�t
�$����~�KK�0�;�v^�8��szK�q�\]Hf����v���r)���Jz}������7�8R��zn�C�&��V�x[om��Q�	��F@	�:�J�NW�H��t��S�����K������-�Sw�	����Y��)��?N��NU�)���"��������*>l����y���t.�����-����~S�U���ip9����P��V��J��j�V�����:~^�<�����xb�	�Yb���G?L�!��>��r+���[6��=t�g�~#�~�!&\��^�n����3�����j�25��5�c���|.%�����������j��R~:���I�jQ�=?:�?xqHu��w�;o���B��g)�w�!vB�bS����z�
��Wk6��X:��Zb�8���B���F�sY������[�e_6��)D�F���&�S�����	��l~eE�8*h�qo�k����7�	e�%Yd1���M�r��'���q	Y[����`���*�������3��r�kK����5�����������}e��=�q,��0��N������|���SV�cz�aBM�5�$2C�I�V�'`�?,@$ n6�h6.�Yo��z
����q�H�>�@�����V�������A��9��7�%!��������:�C��wv�f$AN�xSo���t�	H�����j��~>P�
3s������������B��JX}�%g2y��� %��.#���J�
�p�Y$b��c*��saFn��4ic1L�w+I	>��?�,��|����9s"T�
��A���}|����X�(���b��E��w��������?����R��c�,�	�.&?����m�O���%-��+��2\Lt��.���@�q>����7a3���d|BJ���-uq�������W�Y��7��k0�)����������%Y0�Dx��#C��c(�k��Xi�j.��&�J��(������@�./�����l����ty^5����Z5�-�8t�+�47I=��e�!V+���0{l���"��v�X�ow�ON�w7��A�]P�����<"~�dH�_B���vA�U�,����s1��O�7�l"���@$��2Pn�_�*�"m7�k68��g�sq�����A�Lo�i6nR�/��}#��-�5��f2��l�.&���ab�
P�4l���;��_	�N����D
ZVy�*U24�����$���^)}���?�����3&t	�^��y&��95�mp�*�/�,
�W��m1��BP�]u&0��&�g6�2 ��;������H7�77�
�r}I8�!�_���jy`��M3`����m����Pe�F4�N�g��n����)g#MA��
�j��������&I7���lI?��|��vz|���k�N���0w�	�����XR/W	����������,\:�RG�1Yq,�j�����������P��������^�,�Y9����	a��Wi�.}���Q�k� �;��d�������4��(�����b�+���X� �D���P���J�m���8:��.\f��9���V���F��U�yW�������r��?Y0C&�l� �`�p�������
^J�5x�y�w	��@w���?����`�����M.���=G�c����h�9�Gcd4<�L��1�m��GVMr�G����(���~�����k]�<�If@�����Et|�U�|�.qYm� 9����y�jy�<����������u��1�2���}��6[�
4�8�L��m��=�y� ������N.��!-x�JV)L��Z�;7�H�� �u����dE�0�\D�qx�A/ud%����W����}�"����y��%V�*��(����#�NE�a�p��V��\,G6Z
�m���Y*{Hx_�����H����X���z�e8����\;;^�Q0������#�����'X)�E��#}�E�G��_rm^�y�X	k ���V���E�Pa�I�2:0D}9�f�E'p���>Q%U��zpo��.�lh/���[}/X���%>�����6�

���5n_vIw�����
��y�����B9������F�f��p@je+]�.��
��+?���@�dM<�T5[���r9��-t�j��a�]��4:�b�/*�'��������e�%���p��q�N��O6]-p��D��3���?�`�K��K�[�W�N�5
�p��������p]3a��`�#�����^���+����_�u/d�rE�����d����;`\.��@h�dB{G2�U��FW�� ���?�1�}�,#����i��+��=j���e0	�'@	��@zm�t�����;����aP��x;�O!��a�����(��J�l�T�x�4���&z*3�����w|�j��R{���AOOI�rX�(Q�|��`���.��v����#B�!�B����bo�'�Y8'�����J@��?9�X�p���W���*DW����[8�B��i������������I���s����~/��O���9�{�t�+zX�N�������?������8��<P(� �<j@��e��#�Z��������=/G�����B���fx}�_��8g�����X���W������Hkg����a�b
�?���C"��
�^�����"b�h��p"cL��7��Q�A�������E�^0&*�G1
��AH3�H��+-����[�����Z{��g��S��R�3?���#��)9�?�K�t�.3���&Vy�<{�'�d<
2���;�7p6H�a�6
�SD�'a�p�~�*UJZ��d���������=�*y��&\f(��)�Zx�[�=\ot�;�1�P9tp
�:!�����'>+@PVFE�(2!���=�g�/���'zz�sx�C�/$��(v�{����W��������_�	��<I/C�h4��,�\���x0�.�8�)�m?�#���v�/����u��Ot�sQ�y���^[,{	~�;2������(O��i�E>
��>M+�/�� ���_z���d7����
|�����`�	��`��*]�&�<���x��o�_����a���I�z���3~z�:�lU����1�(��������3O�,��Bc�^(?LI�I��D���~D(��<�Q<�>:��`WHQ(�-]xg���Q=;��o�kg[�R�Zo��*�s7�u��f�L������#�8zAY�)�����������]X��M�)���t������q:��X�������b]z��5<�F�����������0���I����H������3aL��O&c����?��yCuV>�J$�b�++�C���|�
��:!L�D�.Q��t6��Y��8��k�j�V�d�Z����|Lw0�qJ�m=�u�����q�E��cy���0�
f	gn06m*�s�
|4=�3C0���"�.pO� 
���A4�'�-/�L�<5�
x?�s�*@{N�������b{t��.6*V8o�R���,�+�-����0'���x��%<�I�1g�g��4�2w: ,|��6��Vh<@��a����'�T.:AK	��/QFt�
��q���\�S���#i �oT{�Z�Tj��.��"O
1���F)��Q�n��p������h� ���
�D+v�<��e{0�����,HuJn��8!x��g�>�	�����h#OH����(f/�+����pTQ�2��JX���K�y8�'�L}NH�N
y\�t���"x��Gco=Dv��U�����FA��8���o�O����'�Q���$#���>�����Oz��m����q�Z"5Xj��s�)y����{Z�z�sn�ge�Vp��X����4��-�|3�F35�AjrlwZ��
?�D����������M4�2��'��e���>c���F~P-�����n���@@��V
>�-dxD�~
���z�Quhs�DI�DYK��:�K�<��$��y�>$oD�-�K5�K�b�p�������<W��� /�,{��u������A��Ja���G��n���
����M�����������]�7��7�yWs�]���#2[\�@���b�]�oB���Gpb��X���SO�&yr#�)[�+��Z�$�3�'�������w�I��}�8����zZGy�o���?�K��������_���/!�}�o���L�+�F��/�4��J�o�J��hV��V�o�J�V���+�����L������3��{��qd�.�O���R8���S���_��k�R�
�R��t�$�KE�������ng��E�������prQ���gE�:`�C��h`1#��-i�'o���R�v.0I�;P��{���_���������_�q����h�${B��?��
	��%hY`z�w���?��G3T0}? <�`���Q�����F� �Z/e<U�/���\+n��[���&Nu��^R�����3A���o�~�2�k��Y�/��5����8rU�5RDS����u
��Q �V�v����������������C�� �������wM������5\.p���H7##Li���#H�_���0K�������D�{2�Qgt�D�MM��4���e��t�����gA3b�V2D���U�������p���?�;'o_��t�N���{��d����Fz�m#��gj>�6u�(S�������?��k�R��WZ���l��;B���m3K��|0�/v�y4��F;����b�:��g����x>��8�Ot���]�/�������a5��'��L'�U;=���aF���?�uE�w������g�,`��9�=����c=��z}�j�{��������y�K�p��)~>�.��|���x������Lx���h���m��"���Kq�-vn���^��Jx��*�B�E�Y�,����v��������_��0�E�9�N}����>	&����<8]\��B�Y/6�������y��h��~s��y���/���,����D�����tS��a��kP�1� M����\��>��W���2�M*��r��>��1������5�
22[-�v�����KVn�ys�����^�7���):LS��� FO�n��T<�G�xUb�aK�d���l���xl��������-�!p��	U���	Fu���\�DB]��d��Q9���8+UK>�)m��%p*�W��Xl���M]��V��i��:t<8�����C�
����-T�m+
;
�'Y�gpS����G����yE{���� I��n}����[W_%�$d_~�u�������*y��O���*�����X�w����+D������<�$�xA>�v�rH��C�K,6:��l/]gj�~3t�f��B��IJ������
%�x���K�Z��yU0k���;z��2�T�|M'Xo'��4='��`NP��mx���������/�<��h�Y�g��H���������A�mW����/�%���(�(��h:�y����n_����&%C��|o�i&-2I����3\J��8+���z/�e�� �LJ��'L���y~�@�4�~��F��H�>��y1�#��K3�C�3��B>>r��N3,���Z�	kug�y��[k���i[���=�����w��j+��|i���������s�e�(-�mq� ����0�|:���>a�3p�����F<��^��W:��l��&���I�H�
`�m��|����<��1nyO2!E�����
]E^�eB�������)�`�,��������G��:@xV�������E�����6
c��?���Qa�����K/|���C���	�,�#tdRC0���Y9�U��}�'S���qm�
K��]����F����
����i�<�t+�^����o���/��z���6�Nl��L^ZV�%m �Y���_����}�e�7�������Su�8���Xdp ����k�<�clMul�wvO^3?�[4z���������<b��2��X����8�f:��gp�b�i7]����K?���Jw���<#M$�G�>i���1��>���N8����L9��V�}�l��Z�G����Rd��@_`��d����@�t��Yt�6'��$}	�,z���H�����(��K_�E[�D[����-zlm���6l����Z�Gb���cIa3�&�����)��Q��n�
-U7���K"!�k����3��q/��4 �/�����L�f.��O.�5I��L�p�ZTE��9�c�
]����X�3�G�^Hz��%�|l�K��^��u'�xW7����nsr\���M�e�.��__�E� ����8RN��Y��BQl�L��\b,���iHn&���	��
�!�x8�����;��p�4t��g!�x���1�n:B]���l����Z�3L$}�\��t���n.�U�������>(����x�������q����?������5���j��
X�7`C����xfR��B_������@t���k����(��9�x����,8)0]�9A��?�^��J6j�8���
?�����6iBo
_R�I#��gH�&
�"9��/��c��;�st�k�G	\VV��4���VMV�$O���`��(f!�J��2_�L�p��"����b��l�,*����l�V���:mf1z�lvv�Xj�#{��$]�J_��w����e���T~X��`�$�Xe��F���3�Y��a�MM�^��@d;	;"�����������������U��$���y��_�Zz/�Z�D<�3�V�j"[JB=�a-�0���Kk����a��9��~a�V{��\-������Y���Ce+�������^O,C��z#�����:g*��������C����=��b�a�����*Lr�M�
��J�U��r��dE@:��7��h��}�Tj����^{q���HH�MT*T-N9R�^��4��c�@o:<zwXt�r��U�e(0.�����c���P�B~�6�'0$�X�Ba�0q��s��'���K.%o�8����U+����Tn�d�u���d����l;�2�m����=v�����{�6m�.����"fB��Ar�t��J&��5�x���G�g+�,���^)WaM^�N�d��a�VV��&u�����S�3�"��YJG�FmuK{a�M�� ��1e��������������z���?y�V��c^x��#igO��&��F�4�z�$O�6�a]�S_W++�,��<#�}�1i������)����+M��?�>�/������#l�8/�2�L9#��A=@��l������
�L7M��c��)T����y�sQ�<M��V�V+V��h-��({��,��{#>"�B%��z/(��#�x�� �q�G@��>5� ��"�09
����8NN�.2�3��������'(�b�I�eVg$��'��ew*q2���hF�'�EX�Q��8M��+��N��{�)��N��������:��nu�����wJiR-�C	���v�ISX$O���_;��?������$�?�t����yv�O����%�Xhb�34�(��b�����2Z��>�6+
��?z[
�n�N��f���I����is�-�f�
���E:���,/�/3�����d(�M)n`�l��H�I��{	�O�v��T����jU3�|�S�m&������c��(L�Er�I��� �a��(�0b��rN���i�Lk=L��:��?>�T:u�J
`�lj�[�Z�I�:@
3=4���U�D������9\b�����j�'���#y��H@�3b�(X�,��Q�Vn���U�
?7
G����'E��~����,��[(v��q�JJ	;�]����k�W�A��h��#t	�$8��8mS4~�.���'�1|	��c��R�f�h�A���H@�j�X��Qt$RV�(%\���w09�&I��o�^��������[���9�M�I
�)iThV�F7���4��^���4����wN)���LF�������o%�S����I����O�����3��p[�l��& ��������J5��(���-`F{��bh5{���R�l����i�h���Q����)��8\��BuV-{��>��k�0�`��U8T��9T�Q�2�J1O�����tZ�w t�u*b�~������C�Ak��Dc8m�����Bbd�KW�y�pX^y
��T@�������h�+��s��?���+8V��xr�������O0�� 8��_������.C� 7�DT2"�~����Vp�� �H���6@�~O���h�a��izY�9'S[��H��r#��Z�R���[�������m���MU��?���O�7��f����X���*4�2���x:DC)&����gL4�����`���}�� B#���9[
��q��D�bR_+���m�Az��U�����B�++�����?{�9G_�\��QV��H�p��������o�5���5U\�c�ip��]���f0|�1�p����_�`X/��k�a�v!�+��5 �R�y	���
 �_f�>�
U��Sz��h-�U�m��m�P���L��W0g����u&H��T��9�*�0'����Ao&��'HO���\L5^i���--zI>`MN�s���D��q�Q����!1tQ���MJ�����(9������M�b����t��>��'R���R�X`X��S3��p�/1�%�9����=����5�f^�R�nmK��nV��_0�n�����u�v^y%y������A��p:���#[\q�
������SE
��:���x/�f��s����Mo����$��v��3��1�!��m $*�����#K�ed����������,��Zs�Q*��[�F���Um�4����&��<+�W��4k@��5���:XG��x%\ob����FK��C�S�vb�%xM�����N��L�i�����?6�"~�����qn��"����1��G)��4�"J����B���b�Kbe��8�rW��>��%s��[�SK�<�)�`�!k/��Q2��)�=��jwe��l5������4�it@7�u������*��!������*Vm3'?0vN�*mERgt&\�2f�	�
�
F���*�e6��q0��=��E.�NEX�[A��F���x�?�=��7��$�~
��&����2z>�t��d����4��4���z�Io�
����	��6���"NN�>�����W�T����>����|D-��_w�����\����a�YR2����?&����L�#0S��
k�YG�e
��*w�����V�|��t�]n�Z�F������9l�t��#,N������b+,y�0�����kx&���=�
�w����\���^�k#��>t�$�x�l$*���`p��
�`7��������6�9_�&�>���9��^x)]���OX2��"��|�B��X�zC�)�I���@$1	�� �����]b�`���"����M�W1z�%���T�PDnE���_,|�����c�#����������-�V��$���'= e�J+��?9�`��d�A)r�@U{���#�b����	�.f���}��M_M��p�t"8
A������2�#xG:P�DSc��j!V�TjML��b�C� W�-� I�]-#	��v���~'.���I%��=������7�������h�MS1U��Y_^��"<�=:���}�f�b�q��F.L���D��������_�)r�9�����=)on����#�1o��I(g�"E��/l$��i_@�E'�R�4�}}�y���M�v��������S�,���	��1��(�����y��&'���OL��p_���6���
���	�U
�!���L�aS����!7(��.8�_�K�F��Js��Y*���z�f#���t7�hN��Qa����V�zP�^�@���!���|�������j�2p�������0��%dw��������`��e�B������[�����?z�^�P�y+x����������$����|�����������?���|�����������?���|�����������?���|�����������?��������tc0h�
#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Joachim Wieland (#2)
Re: WIP patch for parallel pg_dump

On 02.12.2010 07:39, Joachim Wieland wrote:

On Sun, Nov 14, 2010 at 6:52 PM, Joachim Wieland<joe@mcknight.de> wrote:

You would add a regular parallel dump with

$ pg_dump -j 4 -Fd -f out.dir dbname

So this is an updated series of patches for my parallel pg_dump WIP
patch. Most importantly it now runs on Windows once you get it to
compile there (I have added the new files to the respective project of
Mkvcbuild.pm but I wondered why the other archive formats do not need
to be defined in that file...).

So far nobody has volunteered to review this patch. It would be great
if people could at least check it out, run it and let me know if it
works and if they have any comments.

That's a big patch..

I don't see the point of the sort-by-relpages code. The order the
objects are dumped should be irrelevant, as long as you obey the
restrictions dictated by dependencies. Or is it only needed for the
multiple-target-dirs feature? Frankly I don't see the point of that, so
it would be good to cull it out at least in this first stage.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Heikki Linnakangas (#3)
Re: WIP patch for parallel pg_dump

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

I don't see the point of the sort-by-relpages code. The order the objects
are dumped should be irrelevant, as long as you obey the restrictions
dictated by dependencies. Or is it only needed for the multiple-target-dirs
feature? Frankly I don't see the point of that, so it would be good to cull
it out at least in this first stage.

From the talk at CHAR(10), and provided memory serves, it's an
optimisation so that you're doing largest file in a process and all the
little file in other processes. In lots of case the total pg_dump
duration is then reduced to about the time to dump the biggest files.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#5Joachim Wieland
joe@mcknight.de
In reply to: Heikki Linnakangas (#3)
Re: WIP patch for parallel pg_dump

On Thu, Dec 2, 2010 at 6:19 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I don't see the point of the sort-by-relpages code. The order the objects
are dumped should be irrelevant, as long as you obey the restrictions
dictated by dependencies. Or is it only needed for the multiple-target-dirs
feature? Frankly I don't see the point of that, so it would be good to cull
it out at least in this first stage.

A guy called Dimitri Fontaine actually proposed the
serveral-directories feature here and other people liked the idea.

http://archives.postgresql.org/pgsql-hackers/2008-02/msg01061.php :-)

The code doesn't change much with or without it, and if people are no
longer in favour of it, I have no problem with taking it out.

As Dimitri has already pointed out, the relpage sorting thing is there
to start with the largest table(s) first.

Joachim

#6Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Joachim Wieland (#5)
Re: WIP patch for parallel pg_dump

Joachim Wieland <joe@mcknight.de> writes:

A guy called Dimitri Fontaine actually proposed the
serveral-directories feature here and other people liked the idea.

Hehe :)

Reading that now, it could be that I didn't know at the time that given
a powerful enough subsystem disk there's no way to saturate it with one
CPU. So the use case of parralel dump in a bunch or user given locations
would be to use different mount points (disk subsystems) at the same
time. Not sure how releveant it is.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#7Josh Berkus
josh@agliodbs.com
In reply to: Dimitri Fontaine (#6)
Re: WIP patch for parallel pg_dump

On 12/02/2010 05:50 AM, Dimitri Fontaine wrote:

So the use case of parralel dump in a bunch or user given locations
would be to use different mount points (disk subsystems) at the same
time. Not sure how releveant it is.

I think it will complicate this feature unnecessarily for 9.1.
Personally, I need this patch so much I'm thinking of backporting it.
However, having all the data go to one directory/mount wouldn't trouble
me at all.

Now, if only I could think of some way to write a parallel dump to a set
of pipes, I'd be in heaven.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#7)
Re: WIP patch for parallel pg_dump

On 12/02/2010 12:56 PM, Josh Berkus wrote:

On 12/02/2010 05:50 AM, Dimitri Fontaine wrote:

So the use case of parralel dump in a bunch or user given locations
would be to use different mount points (disk subsystems) at the same
time. Not sure how releveant it is.

I think it will complicate this feature unnecessarily for 9.1.
Personally, I need this patch so much I'm thinking of backporting it.
However, having all the data go to one directory/mount wouldn't
trouble me at all.

Now, if only I could think of some way to write a parallel dump to a
set of pipes, I'd be in heaven.

The only way I can see that working sanely would be to have a program
gathering stuff at the other end of the pipes, and ensuring it was all
coherent. That would be a huge growth in scope for this, and I seriously
doubt it's worth it.

cheers

andrew

#9Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#8)
Re: WIP patch for parallel pg_dump

Now, if only I could think of some way to write a parallel dump to a
set of pipes, I'd be in heaven.

The only way I can see that working sanely would be to have a program
gathering stuff at the other end of the pipes, and ensuring it was all
coherent. That would be a huge growth in scope for this, and I seriously
doubt it's worth it.

Oh, no question. And there's workarounds ... sshfs, for example. I'm
just thinking of the ad-hoc parallel backup I'm running today, which
relies heavily on pipes.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#10Joachim Wieland
joe@mcknight.de
In reply to: Josh Berkus (#7)
Re: WIP patch for parallel pg_dump

On Thu, Dec 2, 2010 at 12:56 PM, Josh Berkus <josh@agliodbs.com> wrote:

Now, if only I could think of some way to write a parallel dump to a set of
pipes, I'd be in heaven.

What exactly are you trying to accomplish with the pipes?

Joachim

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#3)
Re: WIP patch for parallel pg_dump

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

That's a big patch..

Not nearly big enough :-(

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables. I fail to understand
why that consideration can be allowed to go by the wayside now.

regards, tom lane

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#11)
Re: WIP patch for parallel pg_dump

On 12/02/2010 05:01 PM, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

That's a big patch..

Not nearly big enough :-(

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables. I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

cheers

andrew

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#12)
Re: WIP patch for parallel pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

On 12/02/2010 05:01 PM, Tom Lane wrote:

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables. I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's. But the
snapshot issue is definitely a showstopper.)

regards, tom lane

#14Bruce Momjian
bruce@momjian.us
In reply to: Dimitri Fontaine (#4)
Re: WIP patch for parallel pg_dump

Dimitri Fontaine wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

I don't see the point of the sort-by-relpages code. The order the objects
are dumped should be irrelevant, as long as you obey the restrictions
dictated by dependencies. Or is it only needed for the multiple-target-dirs
feature? Frankly I don't see the point of that, so it would be good to cull
it out at least in this first stage.

From the talk at CHAR(10), and provided memory serves, it's an

optimisation so that you're doing largest file in a process and all the
little file in other processes. In lots of case the total pg_dump
duration is then reduced to about the time to dump the biggest files.

Seems there should be a comment in the code explaining why this is being
done.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#13)
Re: WIP patch for parallel pg_dump

On 12/02/2010 05:32 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 12/02/2010 05:01 PM, Tom Lane wrote:

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables. I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

Yes, I agree with that.

(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's. But the
snapshot issue is definitely a showstopper.)

Why is that more an issue with parallel pg_dump?

cheers

andrew

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: WIP patch for parallel pg_dump

On Thu, Dec 2, 2010 at 5:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 12/02/2010 05:01 PM, Tom Lane wrote:

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables.  I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#16)
Re: WIP patch for parallel pg_dump

On 12/02/2010 07:13 PM, Robert Haas wrote:

On Thu, Dec 2, 2010 at 5:32 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 12/02/2010 05:01 PM, Tom Lane wrote:

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables. I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.

That seems like a bit of an easy shot. Requiring that parallel pg_dump
produce a dump that is as consistent as non-parallel pg_dump currently
produces isn't unreasonable. It's not stopping us moving forward, it's
just not wanting to go backwards.

And it shouldn't be terribly hard. IIRC Joachim has already done some
work on it.

cheers

andrew

#18Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#17)
Re: WIP patch for parallel pg_dump

On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

In the past, proposals for this have always been rejected on the
grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables.  I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.

That seems like a bit of an easy shot. Requiring that parallel pg_dump
produce a dump that is as consistent as non-parallel pg_dump currently
produces isn't unreasonable. It's not stopping us moving forward, it's just
not wanting to go backwards.

I certainly agree that would be nice. But if Joachim thought the
patch were useless without that, perhaps he wouldn't have bothered
writing it at this point. In fact, he doesn't think that, and he
mentioned the use cases he sees in his original post. But even
supposing you wouldn't personally find this useful in those
situations, how can you possibly say that HE wouldn't find it useful
in those situations? I understand that people sometimes show up here
and ask for ridiculous things, but I don't think we should be too
quick to attribute ridiculousness to regular contributors.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#18)
Re: WIP patch for parallel pg_dump

On 12/02/2010 07:48 PM, Robert Haas wrote:

On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstan<andrew@dunslane.net> wrote:

In the past, proposals for this have always been rejected on the
grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables. I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.

That seems like a bit of an easy shot. Requiring that parallel pg_dump
produce a dump that is as consistent as non-parallel pg_dump currently
produces isn't unreasonable. It's not stopping us moving forward, it's just
not wanting to go backwards.

I certainly agree that would be nice. But if Joachim thought the
patch were useless without that, perhaps he wouldn't have bothered
writing it at this point. In fact, he doesn't think that, and he
mentioned the use cases he sees in his original post. But even
supposing you wouldn't personally find this useful in those
situations, how can you possibly say that HE wouldn't find it useful
in those situations? I understand that people sometimes show up here
and ask for ridiculous things, but I don't think we should be too
quick to attribute ridiculousness to regular contributors.

Umm, nobody has attributed ridiculousness to anyone. Please don't put
words in my mouth. But I think this is a perfectly reasonable discussion
to have. Nobody gets to come along and get the features they want
without some sort of consensus, not me, not you, not Joachim, not Tom.

cheers

andrew

#20Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#19)
Re: WIP patch for parallel pg_dump

On Dec 2, 2010, at 8:11 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Umm, nobody has attributed ridiculousness to anyone. Please don't put words in my mouth. But I think this is a perfectly reasonable discussion to have. Nobody gets to come along and get the features they want without some sort of consensus, not me, not you, not Joachim, not Tom.

I'm not disputing that we COULD reject the patch. I AM disputing that we've made a cogent argument for doing so.

...Robert

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#15)
Re: WIP patch for parallel pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

On 12/02/2010 05:32 PM, Tom Lane wrote:

(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's. But the
snapshot issue is definitely a showstopper.)

Why is that more an issue with parallel pg_dump?

The scenario that bothers me is

1. pg_dump parent process AccessShareLocks everything to be dumped.

2. somebody else tries to acquire AccessExclusiveLock on table foo.

3. pg_dump child process is told to dump foo, tries to acquire
AccessShareLock.

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete. Can you say "undetectable deadlock"?

regards, tom lane

#22Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#21)
Re: WIP patch for parallel pg_dump

On 12/02/2010 09:09 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 12/02/2010 05:32 PM, Tom Lane wrote:

(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's. But the
snapshot issue is definitely a showstopper.)

Why is that more an issue with parallel pg_dump?

The scenario that bothers me is

1. pg_dump parent process AccessShareLocks everything to be dumped.

2. somebody else tries to acquire AccessExclusiveLock on table foo.
hmm.
3. pg_dump child process is told to dump foo, tries to acquire
AccessShareLock.

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete. Can you say "undetectable deadlock"?

Hmm. Yeah. Maybe we could get around it if we prefork the workers and
they all acquire locks on everything to be dumped up front in nowait
mode, right after the parent, and if they can't the whole dump fails. Or
something along those lines.

cheers

andrew

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#19)
Re: WIP patch for parallel pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Umm, nobody has attributed ridiculousness to anyone. Please don't put
words in my mouth. But I think this is a perfectly reasonable discussion
to have. Nobody gets to come along and get the features they want
without some sort of consensus, not me, not you, not Joachim, not Tom.

In particular, this issue *has* been discussed before, and there was a
consensus that preserving dump consistency was a requirement. I don't
think that Joachim gets to bypass that decision just by submitting a
patch that ignores it.

regards, tom lane

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#22)
Re: WIP patch for parallel pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

On 12/02/2010 09:09 PM, Tom Lane wrote:

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete. Can you say "undetectable deadlock"?

Hmm. Yeah. Maybe we could get around it if we prefork the workers and
they all acquire locks on everything to be dumped up front in nowait
mode, right after the parent, and if they can't the whole dump fails. Or
something along those lines.

[ thinks for a bit... ] Actually it might be good enough if a child
simply takes the lock it needs in nowait mode, and reports failure on
error. We know the parent already has that lock, so the only way that
the child's request can fail is if something conflicting with
AccessShareLock is queued up behind the parent's lock. So failure to
get the child lock immediately proves that the deadlock case applies.

regards, tom lane

#25Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#24)
Re: WIP patch for parallel pg_dump

On 12/02/2010 09:41 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 12/02/2010 09:09 PM, Tom Lane wrote:

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete. Can you say "undetectable deadlock"?

Hmm. Yeah. Maybe we could get around it if we prefork the workers and
they all acquire locks on everything to be dumped up front in nowait
mode, right after the parent, and if they can't the whole dump fails. Or
something along those lines.

[ thinks for a bit... ] Actually it might be good enough if a child
simply takes the lock it needs in nowait mode, and reports failure on
error. We know the parent already has that lock, so the only way that
the child's request can fail is if something conflicting with
AccessShareLock is queued up behind the parent's lock. So failure to
get the child lock immediately proves that the deadlock case applies.

Yeah, that would be a whole lot simpler. It would avoid the deadlock,
but it would have lots more chances for failure. But it would at least
be a good place to start.

cheers

andrew

#26Joachim Wieland
joe@mcknight.de
In reply to: Tom Lane (#23)
Re: WIP patch for parallel pg_dump

On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In particular, this issue *has* been discussed before, and there was a
consensus that preserving dump consistency was a requirement.  I don't
think that Joachim gets to bypass that decision just by submitting a
patch that ignores it.

I am not trying to bypass anything here :) Regarding the locking
issue I probably haven't done sufficient research, at least I managed
to miss the emails that mentioned it. Anyway, that seems to be solved
now fortunately, I'm going to implement your idea over the weekend.

Regarding snapshot cloning and dump consistency, I brought this up
already several months ago and asked if the feature is considered
useful even without snapshot cloning. And actually it was you who
motivated me to work on it even without having snapshot consistency...

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php

In my patch pg_dump emits a warning when called with -j, if you feel
better with an extra option
--i-know-that-i-have-no-synchronized-snapshots, fine with me :-)

In the end we provide a tool with limitations, it might not serve all
use cases but there are use cases that would benefit a lot. I
personally think this is better than to provide no tool at all...

Joachim

#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#23)
Re: WIP patch for parallel pg_dump

On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Umm, nobody has attributed ridiculousness to anyone. Please don't put
words in my mouth. But I think this is a perfectly reasonable discussion
to have. Nobody gets to come along and get the features they want
without some sort of consensus, not me, not you, not Joachim, not Tom.

In particular, this issue *has* been discussed before, and there was a
consensus that preserving dump consistency was a requirement.  I don't
think that Joachim gets to bypass that decision just by submitting a
patch that ignores it.

Well, the discussion that Joachim linked too certainly doesn't have
any sort of clear consensus that that's the only way to go. In fact,
it seems to be much closer to the opposite consensus. Perhaps there
is some OTHER time that this has been discussed where "synchronization
is a hard requirement" was the consensus. There's an old saw that the
nice thing about standards is there are so many to choose from, and
the same thing can certainly be said about -hackers discussions on any
particular topic.

I actually think that the phrase "this has been discussed before and
rejected" should be permanently removed from our list of excuses for
rejecting a patch. Or if we must use that excuse, then I think a link
to the relevant discussion is a must, and the relevant discussion had
better reflect the fact that $TOPIC was in fact rejected. It seems to
me that in at least 50% of cases, someone comes back and says one of
the following things:

1. I searched the archives and could find no discussion along those lines.
2. I read that discussion and it doesn't appear to me that it reflects
a rejection of this idea. Instead what people seemed to be saying was
X.
3. At the time that might have been true, but what has changed in the
meanwhile is X.

In short, the problem with referring to previous discussions is that
our memories grow fuzzy over time. We remember that an idea was not
adopted, but not exactly why it wasn't adopted. We reject a new patch
with a good implementation of $FEATURE because an old patch was badly
done, or fell down on some peripheral issue, or just never got done.
Veteran backend hackers understand the inevitable necessity of arguing
about what consensus is actually reflected in the archives and whether
it's still relevant, but new people can be (and frequently are) put
off by it; and even for experienced contributors, it does little to
advance the dialogue. Hmm, according to so-and-so's memory, sometime
in the fourteen-year-history of the project someone didn't like this
idea, or maybe a similar one. Whee, time to start Googling.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28Andrew Dunstan
andrew@dunslane.net
In reply to: Joachim Wieland (#26)
Re: WIP patch for parallel pg_dump

On 12/02/2010 11:44 PM, Joachim Wieland wrote:

On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

In particular, this issue *has* been discussed before, and there was a
consensus that preserving dump consistency was a requirement. I don't
think that Joachim gets to bypass that decision just by submitting a
patch that ignores it.

I am not trying to bypass anything here :) Regarding the locking
issue I probably haven't done sufficient research, at least I managed
to miss the emails that mentioned it. Anyway, that seems to be solved
now fortunately, I'm going to implement your idea over the weekend.

Regarding snapshot cloning and dump consistency, I brought this up
already several months ago and asked if the feature is considered
useful even without snapshot cloning. And actually it was you who
motivated me to work on it even without having snapshot consistency...

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php

In my patch pg_dump emits a warning when called with -j, if you feel
better with an extra option
--i-know-that-i-have-no-synchronized-snapshots, fine with me :-)

In the end we provide a tool with limitations, it might not serve all
use cases but there are use cases that would benefit a lot. I
personally think this is better than to provide no tool at all...

I think Tom's statement there:

I think migration to a new server version (that's too incompatible for
PITR or pg_migrate migration) is really the only likely use case.

is just wrong. Say you have a site that's open 24/7. But there is a
window of, say, 6 hours, each day, when it's almost but not quite quiet.
You want to be able to make your disaster recovery dump within that
window, and the low level of traffic means you can afford the degraded
performance that might result from a parallel dump. Or say you have a
hot standby machine from which you want to make the dump but want to set
the max_standby_*_delay as low as possible. These are both cases where
you might want parallel dump and yet you want dump consistency. I have a
client currently considering the latter setup, and the timing tolerances
are a little tricky. The times in which the system is in a state that we
want dumped are fixed, and we want to be sure that the dump is finished
by the next time such a time rolls around. (This is a system that in
effect makes one giant state change at a time.) If we can't complete the
dump in that time then there will be a delay introduced to the system's
critical path. Parallel dump will be very useful in helping us avoid
such a situation, but only if it's properly consistent.

I think Josh Berkus' comments in the thread you mentioned are correct:

Actually, I'd say that there's a broad set of cases of people who want
to do a parallel pg_dump while their system is active. Parallel pg_dump
on a stopped system will help some people (for migration, particularly)
but parallel pg_dump with snapshot cloning will help a lot more people.

cheers

andrew

#29Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#28)
Re: WIP patch for parallel pg_dump

On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

I think Josh Berkus' comments in the thread you mentioned are correct:

Actually, I'd say that there's a broad set of cases of people who want
to do a parallel pg_dump while their system is active.  Parallel pg_dump
on a stopped system will help some people (for migration, particularly)
but parallel pg_dump with snapshot cloning will help a lot more people.

But you failed to quote the rest of what he said:

So: if parallel dump in single-user mode is what you can get done, then
do it. We can always improve it later, and we have to start somewhere.
But we will eventually need parallel pg_dump on active systems, and
that should remain on the TODO list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#30Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#29)
Re: WIP patch for parallel pg_dump

On 12/03/2010 11:23 AM, Robert Haas wrote:

On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan<andrew@dunslane.net> wrote:

I think Josh Berkus' comments in the thread you mentioned are correct:

Actually, I'd say that there's a broad set of cases of people who want
to do a parallel pg_dump while their system is active. Parallel pg_dump
on a stopped system will help some people (for migration, particularly)
but parallel pg_dump with snapshot cloning will help a lot more people.

But you failed to quote the rest of what he said:

So: if parallel dump in single-user mode is what you can get done, then
do it. We can always improve it later, and we have to start somewhere.
But we will eventually need parallel pg_dump on active systems, and
that should remain on the TODO list.

Right, and the reason I don't think that's right is that it seems to me
like a serious potential footgun.

But in any case, the reason I quoted Josh was in answer to a different
point, namely Tom's statement about the limited potential uses.

cheers

andre

#31Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#30)
Re: WIP patch for parallel pg_dump

On Fri, Dec 3, 2010 at 11:40 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 12/03/2010 11:23 AM, Robert Haas wrote:

On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan<andrew@dunslane.net>
 wrote:

I think Josh Berkus' comments in the thread you mentioned are correct:

Actually, I'd say that there's a broad set of cases of people who want
to do a parallel pg_dump while their system is active.  Parallel pg_dump
on a stopped system will help some people (for migration, particularly)
but parallel pg_dump with snapshot cloning will help a lot more people.

But you failed to quote the rest of what he said:

So: if parallel dump in single-user mode is what you can get done, then
do it.  We can always improve it later, and we have to start somewhere.
But we will eventually need parallel pg_dump on active systems, and
that should remain on the TODO list.

Right, and the reason I don't think that's right is that it seems to me like
a serious potential footgun.

But in any case, the reason I quoted Josh was in answer to a different
point, namely Tom's statement about the limited potential uses.

I know the use cases are limited, but I think it's still useful on its own.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#31)
Re: WIP patch for parallel pg_dump

Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010:

I know the use cases are limited, but I think it's still useful on its own.

I don't understand what's so difficult about starting with the snapshot
cloning patch. AFAIR it's already been written anyway, no?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#33Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#32)
Re: WIP patch for parallel pg_dump

On 12/03/2010 12:17 PM, Alvaro Herrera wrote:

Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010:

I know the use cases are limited, but I think it's still useful on its own.

I don't understand what's so difficult about starting with the snapshot
cloning patch. AFAIR it's already been written anyway, no?

Yeah. If we can do it then this whole argument becomes moot. Like you I
don't see why we can't.

cheers

andrew

#34Greg Smith
greg@2ndquadrant.com
In reply to: Joachim Wieland (#26)
Re: WIP patch for parallel pg_dump

Joachim Wieland wrote:

Regarding snapshot cloning and dump consistency, I brought this up
already several months ago and asked if the feature is considered
useful even without snapshot cloning.

In addition, Joachim submitted a synchronized snapshot patch that looks
to me like it slipped through the cracks without being fully explored.
Since it's split in the official archives the easiest way to read the
thread is at
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg143866.html

Or you can use these two:
http://archives.postgresql.org/pgsql-hackers/2010-01/msg00916.php
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00363.php

That never made it into a CommitFest proper that I can see, it just
picked up review mainly from Markus. The way I read that thread, there
were two objections:

1) This mechanism isn't general enough for all use-cases outside of
pg_dump, which doesn't make it wrong when the question is how to get
parallel pg_dump running

2) Running as superuser is excessive. Running as the database owner was
suggested as likely to be good enough for pg_dump purposes.

Ultimately I think that stalled because without a client that needed it
the code wasn't so interesting yet. But now there is one; should that
get revived again? It seems like all of the pieces needed to build
what's really desired here are available, it's just the always
non-trivial task of integrating them together the right way that's needed.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#34)
Re: WIP patch for parallel pg_dump

Greg Smith <greg@2ndquadrant.com> writes:

In addition, Joachim submitted a synchronized snapshot patch that looks
to me like it slipped through the cracks without being fully explored.
...
The way I read that thread, there were two objections:

1) This mechanism isn't general enough for all use-cases outside of
pg_dump, which doesn't make it wrong when the question is how to get
parallel pg_dump running

2) Running as superuser is excessive. Running as the database owner was
suggested as likely to be good enough for pg_dump purposes.

IIRC, in old discussions of this problem we first considered allowing
clients to pull down an explicit representation of their snapshot (which
actually is an existing feature now, txid_current_snapshot()) and then
upload that again to become the active snapshot in another connection.
That was rejected on the grounds that you could cause all kinds of
mischief by uploading a bad snapshot; so we decided to think about
providing a server-side-only means to clone another backend's current
snapshot. Which is essentially what Joachim's above-mentioned patch
provides. However, as was discussed in that thread, that approach is
far from being ideal either.

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges. The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction. I think the requirements would basically be
(1) xmin <= any listed XIDs < xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

One tricky part would be ensuring GlobalXmin doesn't decrease when the
snap is installed, but I think that could be made to work if we take
ProcArrayLock exclusively and insist on observing some other running
transaction with xmin <= proposed xmin. For the pg_dump case this would
certainly hold since xmin would be the parent pg_dump's xmin.

Given the checks stated above, it would be possible for someone to
install a snapshot that corresponds to no actual state of the database,
eg it shows some T1 as running and T2 as committed when actually T1
committed before T2. I don't see any simple way for the installation
function to detect that, but I'm not sure whether it matters. The user
might see inconsistent data, but do we care? Perhaps as a safety
measure we should only allow snapshot installation in read-only
transactions, so that even if the xact does observe inconsistent data it
can't possibly corrupt the database state thereby. This'd be no skin
off pg_dump's nose, obviously. Or compromise on "only superusers can
do it in non-read-only transactions".

Thoughts?

regards, tom lane

#36Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#35)
Re: WIP patch for parallel pg_dump

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges.  The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction.  I think the requirements would basically be
(1) xmin <= any listed XIDs < xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

Thoughts?

I think this is too ugly to live. I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots. It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.

How about just pg_publish_snapshot(), returning a token that is only
valid until the end of the transaction in which it was called, and
pg_subscribe_snapshot(token)? The implementation can be that the
publisher writes its snapshot to a temp file and returns the name of
the temp file, setting an at-commit hook to remove the temp file. The
subscriber reads the temp file and sets the contents as its
transaction snapshot. If security is a concern, one could also save
the publisher's role OID to the file and require the subscriber's to
match.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#36)
Re: WIP patch for parallel pg_dump

On 12/05/2010 08:55 PM, Robert Haas wrote:

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges. The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction. I think the requirements would basically be
(1) xmin<= any listed XIDs< xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

Thoughts?

I think this is too ugly to live. I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots. It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.

How about just pg_publish_snapshot(), returning a token that is only
valid until the end of the transaction in which it was called, and
pg_subscribe_snapshot(token)? The implementation can be that the
publisher writes its snapshot to a temp file and returns the name of
the temp file, setting an at-commit hook to remove the temp file. The
subscriber reads the temp file and sets the contents as its
transaction snapshot. If security is a concern, one could also save
the publisher's role OID to the file and require the subscriber's to
match.

Why not just say give me the snapshot currently held by process nnnn?

And please, not temp files if possible.

cheers

andrew

#38Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#37)
Re: WIP patch for parallel pg_dump

On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Why not just say give me the snapshot currently held by process nnnn?

And please, not temp files if possible.

As far as I'm aware, the full snapshot doesn't normally exist in
shared memory, hence the need for publication of some sort. We could
dedicate a shared memory region for publication but then you have to
decide how many slots to allocate, and any number you pick will be too
many for some people and not enough for others, not to mention that
shared memory is a fairly precious resource.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#39Joachim Wieland
joe@mcknight.de
In reply to: Robert Haas (#38)
1 attachment(s)
Re: WIP patch for parallel pg_dump

On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Why not just say give me the snapshot currently held by process nnnn?

And please, not temp files if possible.

As far as I'm aware, the full snapshot doesn't normally exist in
shared memory, hence the need for publication of some sort.  We could
dedicate a shared memory region for publication but then you have to
decide how many slots to allocate, and any number you pick will be too
many for some people and not enough for others, not to mention that
shared memory is a fairly precious resource.

So here is a patch that I have been playing with in the past, I have
done it a while back and thanks go to Koichi Suzuki for his helpful
comments. I have not published it earlier because I haven't worked on
it recently and from the discussion that I brought up in march I got
the feeling that people are fine with having a first version of
parallel dump without synchronized snapshots.

I am not really sure that what the patch does is sufficient nor if it
does it in the right way but I hope that it can serve as a basis to
collect ideas (and doubt).

My idea is pretty much similar to Robert's about publishing snapshots
and subscribing to them, the patch even uses these words.

Basically the idea is that a transaction in isolation level
serializable can publish a snapshot and as long as this transaction is
alive, its snapshot can be adopted by other transactions. Requiring
the publishing transaction to be serializable guarantees that the copy
of the snapshot in shared memory is always current. When the
transaction ends, the copy of the snapshot is also invalidated and
cannot be adopted anymore. So instead of doing explicit checks, the
patch aims at always having a reference transaction around that
guarantees validity of the snapshot information in shared memory.

The patch currently creates a new area in shared memory to store
snapshot information but we can certainly discuss this... I had a GUC
in mind that can control the number of available "slots", similar to
max_prepared_transactions. Snapshot information can become quite
large, especially with a high number of max_connections.

Known limitations: the patch is lacking awareness of prepared
transactions completely and doesn't check if both backends belong to
the same user.

Joachim

Attachments:

syncSnapshots.difftext/x-patch; charset=US-ASCII; name=syncSnapshots.diffDownload
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 95beba8..c24150f 100644
*** a/src/backend/storage/ipc/ipci.c
--- b/src/backend/storage/ipc/ipci.c
*************** CreateSharedMemoryAndSemaphores(bool mak
*** 124,129 ****
--- 124,130 ----
  		size = add_size(size, BTreeShmemSize());
  		size = add_size(size, SyncScanShmemSize());
  		size = add_size(size, AsyncShmemSize());
+ 		size = add_size(size, SyncSnapshotShmemSize());
  #ifdef EXEC_BACKEND
  		size = add_size(size, ShmemBackendArraySize());
  #endif
*************** CreateSharedMemoryAndSemaphores(bool mak
*** 228,233 ****
--- 229,235 ----
  	BTreeShmemInit();
  	SyncScanShmemInit();
  	AsyncShmemInit();
+ 	SyncSnapshotInit();
  
  #ifdef EXEC_BACKEND
  
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 6e7a6db..00522fb 100644
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
*************** typedef struct ProcArrayStruct
*** 91,96 ****
--- 91,111 ----
  
  static ProcArrayStruct *procArray;
  
+ 
+ /* this should be a GUC later... */
+ #define MAX_SYNC_SNAPSHOT_SETS	4
+ typedef struct
+ {
+ 	SnapshotData	ssd;
+ 	char			name[NAMEDATALEN];
+ 	BackendId		backendId;
+ 	Oid				databaseId;
+ } NamedSnapshotData;
+ 
+ typedef NamedSnapshotData* NamedSnapshot;
+ 
+ static NamedSnapshot syncSnapshots;
+ 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
*************** static int KnownAssignedXidsGetAndSetXmi
*** 159,164 ****
--- 174,182 ----
  static TransactionId KnownAssignedXidsGetOldestXmin(void);
  static void KnownAssignedXidsDisplay(int trace_level);
  
+ static bool DeleteSyncSnapshot(const char *name);
+ static bool snapshotPublished = false;  /* true if we have published at least one snapshot */
+ 
  /*
   * Report shared-memory space needed by CreateSharedProcArray.
   */
*************** ProcArrayRemove(PGPROC *proc, Transactio
*** 350,355 ****
--- 368,379 ----
  void
  ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
  {
+ 	if (snapshotPublished)
+ 	{
+ 		DeleteSyncSnapshot(NULL);
+ 		snapshotPublished = false;
+ 	}
+ 
  	if (TransactionIdIsValid(latestXid))
  	{
  		/*
*************** KnownAssignedXidsDisplay(int trace_level
*** 3104,3106 ****
--- 3132,3374 ----
  
  	pfree(buf.data);
  }
+ 
+ 
+ /*
+  *  Report space needed for our shared memory area.
+  *
+  *  Memory is structured as follows:
+  *
+  *  NamedSnapshotData[0]
+  *  NamedSnapshotData[1]
+  *  NamedSnapshotData[2]
+  *  Xids for NamedSnapshotData[0]
+  *  Sub-Xids for NamedSnapshotData[0]
+  *  Xids for NamedSnapshotData[1]
+  *  Sub-Xids for NamedSnapshotData[1]
+  *  Xids for NamedSnapshotData[2]
+  *  Sub-Xids for NamedSnapshotData[2]
+  */
+ Size
+ SyncSnapshotShmemSize(void)
+ {
+ 	Size size;
+ 
+ 	size = sizeof(NamedSnapshotData);
+ 	size = add_size(size, PROCARRAY_MAXPROCS * sizeof(TransactionId));
+ 	size = add_size(size, TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));
+ 	size = mul_size(size, MAX_SYNC_SNAPSHOT_SETS);
+ 
+ 	return size;
+ }
+ 
+ void
+ SyncSnapshotInit(void)
+ {
+ 	Size	size;
+ 	bool	found;
+ 
+ 	size = SyncSnapshotShmemSize();
+ 
+ 	syncSnapshots = (NamedSnapshot) ShmemInitStruct("SyncSnapshotSets",
+ 													 size, &found);
+ 	if (!found)
+ 	{
+ 		int		i;
+ 		/* XXX is this always properly aligned? */
+ 		void   *ptr = (void *) &syncSnapshots[MAX_SYNC_SNAPSHOT_SETS];
+ 		/* ptr now points past the last syncSnapshot entry */
+ 		for (i = 0; i < MAX_SYNC_SNAPSHOT_SETS; i++)
+ 		{
+ 			NamedSnapshot	ns = &syncSnapshots[i];
+ 
+ 			/* ptr is aligned in the beginning and as we add only multiples of
+ 			 * sizeof(TransactionId), it is still aligned for every loop */
+ 			ns->ssd.xip = (TransactionId *) ptr;
+ 
+ 			ptr = (TransactionId *) ptr + PROCARRAY_MAXPROCS;
+ 
+ 			/* ptr now points past what we reserve for xip */
+ 			ns->ssd.subxip = ptr;
+ 			ptr = (TransactionId *) ptr + TOTAL_MAX_CACHED_SUBXIDS;
+ 
+ 			ns->name[0] = '\0';
+ 			ns->backendId = InvalidBackendId;
+ 			ns->databaseId = InvalidOid;
+ 		}
+ 	}
+ }
+ 
+ static bool
+ DeleteSyncSnapshot(const char* name)
+ {
+ 	TransactionId  *xip,
+ 				   *subxip;
+ 	NamedSnapshot	ns;
+ 	int				i;
+ 	bool			found = false;
+ 
+ 	LWLockAcquire(SyncSnapshotLock, LW_EXCLUSIVE);
+ 	for (i = 0; i < MAX_SYNC_SNAPSHOT_SETS; i++)
+ 	{
+ 		ns = &syncSnapshots[i];
+ 
+ 		/* don't look at other backends' snapshots */
+ 		if (ns->backendId != MyBackendId)
+ 			continue;
+ 
+ 		Assert(ns->databaseId == MyDatabaseId);
+ 
+ 		/* name == NULL means that we want to delete all of our snapshots */
+ 		if (!name || strcmp(name, syncSnapshots[i].name) == 0)
+ 		{
+ 			found = true;
+ 
+ 			/* save pointers */
+ 			xip = ns->ssd.xip;
+ 			subxip = ns->ssd.subxip;
+ 
+ 			memset(ns, 0, sizeof(NamedSnapshotData));
+ 
+ 			/* Actually it would be sufficient to set the backendId to
+ 			 * InvalidBackendId to invalide of this snapshot */
+ 			ns->backendId = InvalidBackendId;
+ 
+ 			/* restore pointers */
+ 			ns->ssd.xip = xip;
+ 			ns->ssd.subxip = subxip;
+ 
+ 			memset(xip, 0, sizeof(TransactionId *) * PROCARRAY_MAXPROCS);
+ 			memset(subxip, 0, sizeof(TransactionId *) * TOTAL_MAX_CACHED_SUBXIDS);
+ 		}
+ 	}
+ 	LWLockRelease(SyncSnapshotLock);
+ 
+ 	return found;
+ }
+ 
+ bool
+ UnpublishSnapshot(const char *name)
+ {
+ 	return DeleteSyncSnapshot(name);
+ }
+ 
+ bool
+ PublishSnapshot(Snapshot snapshot, const char *name)
+ {
+ 	int				i;
+ 	bool			found = false;
+ 	NamedSnapshot	ns;
+ 	TransactionId  *xip, *subxip;
+ 
+ 	if (!IsolationUsesXactSnapshot())
+ 		elog(ERROR, "Transaction must use TRANSACTION ISOLATION LEVEL "
+ 					"SERIALIZABLE to publish snapshots");
+ 
+ 	if (!XactReadOnly)
+ 		elog(WARNING, "Transaction is not read only");
+ 
+ 	LWLockAcquire(SyncSnapshotLock, LW_EXCLUSIVE);
+ 
+ 	/* First check for an existing publication with the same name in the same
+ 	 * database. */
+ 	for (i = 0; i < MAX_SYNC_SNAPSHOT_SETS; i++)
+ 	{
+ 		if (syncSnapshots[i].databaseId == MyDatabaseId &&
+ 			strcmp(syncSnapshots[i].name, name) == 0 &&
+ 			syncSnapshots[i].backendId != InvalidBackendId)
+ 
+ 			elog(ERROR, "A snapshot with this name has already been published");
+ 	}
+ 
+ 	/* find some free space in shared memory to copy the snapshot to.
+ 	 * Make sure the name is unique. */
+ 	for (i = 0; i < MAX_SYNC_SNAPSHOT_SETS; i++)
+ 	{
+ 		if (syncSnapshots[i].backendId == InvalidBackendId)
+ 		{
+ 			found = true;
+ 			/* only valid for now with redundant cleanup upon init and deletion */
+ 			Assert(syncSnapshots[i].name[0] == '\0');
+ 			ns = &syncSnapshots[i];
+ 			break;
+ 		}
+ 	}
+ 
+ 	if (found)
+ 	{
+ 		/* save pointers */
+ 		xip = ns->ssd.xip;
+ 		subxip = ns->ssd.subxip;
+ 
+ 		memcpy(&ns->ssd, snapshot, sizeof(SnapshotData));
+ 
+ 		/* restore pointers */
+ 		ns->ssd.xip = xip;
+ 		ns->ssd.subxip = subxip;
+ 
+ 		memcpy(&ns->ssd.xip, snapshot->xip,
+ 			   sizeof(TransactionId) * snapshot->xcnt);
+ 		memcpy(&ns->ssd.subxip, snapshot->subxip,
+ 			   sizeof(TransactionId) * snapshot->subxcnt);
+ 
+ 		/* set the name and backend id */
+ 		strcpy(ns->name, name);
+ 		ns->backendId = MyBackendId;
+ 		ns->databaseId = MyDatabaseId;
+ 
+ 		snapshotPublished = true;
+ 	}
+ 
+ 	LWLockRelease(SyncSnapshotLock);
+ 
+ 	return found;
+ }
+ 
+ bool
+ SubscribeSnapshot(const char *name, Snapshot snapshot)
+ {
+ 	NamedSnapshot	ns;
+ 	bool			found = false;
+ 	int				i;
+ 
+ 	LWLockAcquire(SyncSnapshotLock, LW_SHARED);
+ 
+ 	for (i = 0; i < MAX_SYNC_SNAPSHOT_SETS; i++)
+ 	{
+ 		if (strcmp(syncSnapshots[i].name, name) == 0 &&
+ 			MyDatabaseId == syncSnapshots[i].databaseId &&
+ 			syncSnapshots[i].backendId != InvalidBackendId)
+ 		{
+ 			found = true;
+ 			ns = &syncSnapshots[i];
+ 			break;
+ 		}
+ 	}
+ 
+ 	if (found)
+ 	{
+ 		/* Do we somehow need to unregister the old snapshot and register the
+ 		 * new one?
+ 		 * Do we need to set any of the other fields?
+ 		 * active_count / regd_count / curcid ?
+ 		 */
+ 		snapshot->xmin = ns->ssd.xmin;
+ 		snapshot->xmax = ns->ssd.xmax;
+ 
+ 		if ((snapshot->xcnt = ns->ssd.xcnt))
+ 			memcpy(snapshot->xip, &ns->ssd.xip,
+ 				   sizeof(TransactionId) * ns->ssd.xcnt);
+ 		if ((snapshot->subxcnt = ns->ssd.subxcnt))
+ 			memcpy(snapshot->subxip, &ns->ssd.subxip,
+ 				   sizeof(TransactionId) * ns->ssd.subxcnt);
+ 
+ 		if (!IsolationUsesXactSnapshot())
+ 			elog(WARNING, "Transaction is not ISOLATION LEVEL SERIALIZABLE");
+ 	}
+ 
+ 	LWLockRelease(SyncSnapshotLock);
+ 
+ 	return found;
+ }
+ 
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 273d8bd..cf83acb 100644
*** a/src/backend/utils/time/snapmgr.c
--- b/src/backend/utils/time/snapmgr.c
***************
*** 29,34 ****
--- 29,35 ----
  #include "access/xact.h"
  #include "storage/proc.h"
  #include "storage/procarray.h"
+ #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/memutils.h"
  #include "utils/resowner.h"
*************** AtEOXact_Snapshot(bool isCommit)
*** 559,561 ****
--- 560,628 ----
  	FirstSnapshotSet = false;
  	registered_xact_snapshot = false;
  }
+ 
+ 
+ 
+ Datum
+ pg_publish_snapshot(PG_FUNCTION_ARGS)
+ {
+ 	text	   *nameText = PG_GETARG_TEXT_P(0);
+ 	char	   *name;
+ 
+ 	name = text_to_cstring(nameText);
+ 
+ 	if (strlen(name) > NAMEDATALEN - 1)
+ 		ereport(ERROR, (errcode(ERRCODE_NAME_TOO_LONG),
+ 						errmsg("identifier too long"),
+ 						errdetail("Identifier must be less than %d characters.",
+ 								  NAMEDATALEN)));
+ 
+ 	if (name[0] == '\0')
+ 		ereport(ERROR, (errcode(ERRCODE_INVALID_NAME),
+ 						errmsg("invalid identifier")));
+ 
+ 	if (PublishSnapshot(GetTransactionSnapshot(), name))
+ 		PG_RETURN_BOOL(true);
+ 	else
+ 		PG_RETURN_BOOL(false);
+ }
+ 
+ Datum
+ pg_unpublish_snapshot(PG_FUNCTION_ARGS)
+ {
+ 	text	   *nameText = PG_GETARG_TEXT_P(0);
+ 	char	   *name;
+ 
+ 	name = text_to_cstring(nameText);
+ 
+ 	if (strlen(name) > NAMEDATALEN - 1)
+ 		ereport(ERROR, (errcode(ERRCODE_NAME_TOO_LONG),
+ 						errmsg("identifier too long"),
+ 						errdetail("Identifier must be less than %d characters.",
+ 								  NAMEDATALEN)));
+ 
+ 	if (name[0] == '\0')
+ 		ereport(ERROR, (errcode(ERRCODE_INVALID_NAME),
+ 						errmsg("invalid identifier")));
+ 
+ 	if (UnpublishSnapshot(name))
+ 		PG_RETURN_BOOL(true);
+ 	else
+ 		PG_RETURN_BOOL(false);
+ }
+ 
+ Datum
+ pg_subscribe_snapshot(PG_FUNCTION_ARGS)
+ {
+ 	text	   *nameText = PG_GETARG_TEXT_P(0);
+ 	char	   *name;
+ 
+ 	name = text_to_cstring(nameText);
+ 
+ 	if (name[0] == '\0')
+ 		ereport(ERROR, (errcode(ERRCODE_INVALID_NAME),
+ 						errmsg("invalid identifier")));
+ 
+ 	PG_RETURN_BOOL(SubscribeSnapshot(name, GetTransactionSnapshot()));
+ }
+ 
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 25a3912..bb040d4 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2171 ( pg_cancel_backe
*** 3369,3374 ****
--- 3369,3380 ----
  DESCR("cancel a server process' current query");
  DATA(insert OID = 2096 ( pg_terminate_backend		PGNSP PGUID 12 1 0 0 f f f t f v 1 0 16 "23" _null_ _null_ _null_ _null_ pg_terminate_backend _null_ _null_ _null_ ));
  DESCR("terminate a server process");
+ DATA(insert OID = 3115 ( pg_publish_snapshot		PGNSP PGUID 12 1 0 0 f f f t f v 1 0 16 "25" _null_ _null_ _null_ _null_ pg_publish_snapshot _null_ _null_ _null_ ));
+ DESCR("publish a snapshot");
+ DATA(insert OID = 3116 ( pg_unpublish_snapshot		PGNSP PGUID 12 1 0 0 f f f t f v 1 0 16 "25" _null_ _null_ _null_ _null_ pg_unpublish_snapshot _null_ _null_ _null_ ));
+ DESCR("unpublish a snapshot");
+ DATA(insert OID = 3117 ( pg_subscribe_snapshot		PGNSP PGUID 12 1 0 0 f f f t f v 1 0 16 "25" _null_ _null_ _null_ _null_ pg_subscribe_snapshot _null_ _null_ _null_ ));
+ DESCR("subscribe to a published snapshot");
  DATA(insert OID = 2172 ( pg_start_backup		PGNSP PGUID 12 1 0 0 f f f t f v 2 0 25 "25 16" _null_ _null_ _null_ _null_ pg_start_backup _null_ _null_ _null_ ));
  DESCR("prepare for taking an online backup");
  DATA(insert OID = 2173 ( pg_stop_backup			PGNSP PGUID 12 1 0 0 f f f t f v 0 0 25 "" _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ ));
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 548e7e0..44f0ac8 100644
*** a/src/include/storage/lwlock.h
--- b/src/include/storage/lwlock.h
*************** typedef enum LWLockId
*** 70,75 ****
--- 70,76 ----
  	RelationMappingLock,
  	AsyncCtlLock,
  	AsyncQueueLock,
+ 	SyncSnapshotLock,
  	/* Individual lock IDs end here */
  	FirstBufMappingLock,
  	FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 959033e..f3387fc 100644
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
*************** extern void XidCacheRemoveRunningXids(Tr
*** 72,75 ****
--- 72,81 ----
  						  int nxids, const TransactionId *xids,
  						  TransactionId latestXid);
  
+ extern Size SyncSnapshotShmemSize(void);
+ extern void SyncSnapshotInit(void);
+ extern bool PublishSnapshot(Snapshot snapshot, const char *name);
+ extern bool UnpublishSnapshot(const char *name);
+ extern bool SubscribeSnapshot(const char *name, Snapshot snapshot);
+ 
  #endif   /* PROCARRAY_H */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f03647b..a8292be 100644
*** a/src/include/utils/snapmgr.h
--- b/src/include/utils/snapmgr.h
*************** extern void AtSubAbort_Snapshot(int leve
*** 43,46 ****
--- 43,50 ----
  extern void AtEarlyCommit_Snapshot(void);
  extern void AtEOXact_Snapshot(bool isCommit);
  
+ extern Datum pg_publish_snapshot(PG_FUNCTION_ARGS);
+ extern Datum pg_unpublish_snapshot(PG_FUNCTION_ARGS);
+ extern Datum pg_subscribe_snapshot(PG_FUNCTION_ARGS);
+ 
  #endif   /* SNAPMGR_H */
#40Koichi Suzuki
koichi.szk@gmail.com
In reply to: Joachim Wieland (#39)
Re: WIP patch for parallel pg_dump

Thank you Joachim;

Yes, and the current patch requires the original (publisher)
transaction is alive to prevent RecentXmin updated.

I hope this restriction is acceptable if publishing/subscribing is
provided via functions, not statements.

Cheers;
----------
Koichi Suzuki

2010/12/6 Joachim Wieland <joe@mcknight.de>:

Show quoted text

On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Why not just say give me the snapshot currently held by process nnnn?

And please, not temp files if possible.

As far as I'm aware, the full snapshot doesn't normally exist in
shared memory, hence the need for publication of some sort.  We could
dedicate a shared memory region for publication but then you have to
decide how many slots to allocate, and any number you pick will be too
many for some people and not enough for others, not to mention that
shared memory is a fairly precious resource.

So here is a patch that I have been playing with in the past, I have
done it a while back and thanks go to Koichi Suzuki for his helpful
comments. I have not published it earlier because I haven't worked on
it recently and from the discussion that I brought up in march I got
the feeling that people are fine with having a first version of
parallel dump without synchronized snapshots.

I am not really sure that what the patch does is sufficient nor if it
does it in the right way but I hope that it can serve as a basis to
collect ideas (and doubt).

My idea is pretty much similar to Robert's about publishing snapshots
and subscribing to them, the patch even uses these words.

Basically the idea is that a transaction in isolation level
serializable can publish a snapshot and as long as this transaction is
alive, its snapshot can be adopted by other transactions. Requiring
the publishing transaction to be serializable guarantees that the copy
of the snapshot in shared memory is always current. When the
transaction ends, the copy of the snapshot is also invalidated and
cannot be adopted anymore. So instead of doing explicit checks, the
patch aims at always having a reference transaction around that
guarantees validity of the snapshot information in shared memory.

The patch currently creates a new area in shared memory to store
snapshot information but we can certainly discuss this... I had a GUC
in mind that can control the number of available "slots", similar to
max_prepared_transactions. Snapshot information can become quite
large, especially with a high number of max_connections.

Known limitations: the patch is lacking awareness of prepared
transactions completely and doesn't check if both backends belong to
the same user.

Joachim

#41Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#36)
Re: WIP patch for parallel pg_dump

On 06.12.2010 02:55, Robert Haas wrote:

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges. The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction. I think the requirements would basically be
(1) xmin<= any listed XIDs< xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

Thoughts?

I think this is too ugly to live. I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots. It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.

The client doesn't need to know anything about the snapshot blob that
the server gives it. It just needs to pass it back to the server through
the other connection. To the client, it's just an opaque chunk of bytes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#42Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#41)
Re: WIP patch for parallel pg_dump

On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 06.12.2010 02:55, Robert Haas wrote:

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges.  The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction.  I think the requirements would basically be
(1) xmin<= any listed XIDs<  xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

Thoughts?

I think this is too ugly to live.  I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots.  It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.

The client doesn't need to know anything about the snapshot blob that the
server gives it. It just needs to pass it back to the server through the
other connection. To the client, it's just an opaque chunk of bytes.

I suppose that would work, but I still think it's a bad idea. We made
this mistake with expression trees. Any oversight in the code that
validates the chunk of bytes when it (or a modified version) is sent
back to the server turns into a security hole. I think it's a whole
lot simpler and cleaner to keep the representation details private to
the server.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#43Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#42)
Re: WIP patch for parallel pg_dump

On 06.12.2010 14:57, Robert Haas wrote:

On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

The client doesn't need to know anything about the snapshot blob that the
server gives it. It just needs to pass it back to the server through the
other connection. To the client, it's just an opaque chunk of bytes.

I suppose that would work, but I still think it's a bad idea. We made
this mistake with expression trees. Any oversight in the code that
validates the chunk of bytes when it (or a modified version) is sent
back to the server turns into a security hole.

True, but a snapshot is a lot simpler than an expression tree. It's
pretty much impossible to plug all the holes in the expression-tree
reading functions, and keep them hole-free in the future. The expression
tree format is constantly in flux. A snapshot, however, is a fairly
isolated small data structure that rarely changes.

I think it's a whole
lot simpler and cleaner to keep the representation details private to
the server.

Well, then you need some sort of cross-backend communication, which is
always a bit clumsy.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#44Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#43)
Re: WIP patch for parallel pg_dump

On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 06.12.2010 14:57, Robert Haas wrote:

On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com>  wrote:

The client doesn't need to know anything about the snapshot blob that the
server gives it. It just needs to pass it back to the server through the
other connection. To the client, it's just an opaque chunk of bytes.

I suppose that would work, but I still think it's a bad idea.  We made
this mistake with expression trees.  Any oversight in the code that
validates the chunk of bytes when it (or a modified version) is sent
back to the server turns into a security hole.

True, but a snapshot is a lot simpler than an expression tree. It's pretty
much impossible to plug all the holes in the expression-tree reading
functions, and keep them hole-free in the future. The expression tree format
is constantly in flux. A snapshot, however, is a fairly isolated small data
structure that rarely changes.

I guess. It still seems far too much like exposing the server's guts
for my taste. It might not be as bad as the expression tree stuff,
but there's nothing particularly good about it either.

 I think it's a whole
lot simpler and cleaner to keep the representation details private to
the server.

Well, then you need some sort of cross-backend communication, which is
always a bit clumsy.

A temp file seems quite sufficient, and not at all difficult.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#45Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#44)
Re: WIP patch for parallel pg_dump

On 06.12.2010 15:53, Robert Haas wrote:

I guess. It still seems far too much like exposing the server's guts
for my taste. It might not be as bad as the expression tree stuff,
but there's nothing particularly good about it either.

Note that we already have txid_current_snapshot() function, which
exposes all that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#46Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#45)
Re: WIP patch for parallel pg_dump

On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 06.12.2010 15:53, Robert Haas wrote:

I guess.  It still seems far too much like exposing the server's guts
for my taste.  It might not be as bad as the expression tree stuff,
but there's nothing particularly good about it either.

Note that we already have txid_current_snapshot() function, which exposes
all that.

Fair enough, and I think that's actually useful for Slony &c. But I
don't think we should shy away of providing a cleaner API here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#47Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#46)
Re: WIP patch for parallel pg_dump

On 12/06/2010 10:22 AM, Robert Haas wrote:

On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 06.12.2010 15:53, Robert Haas wrote:

I guess. It still seems far too much like exposing the server's guts
for my taste. It might not be as bad as the expression tree stuff,
but there's nothing particularly good about it either.

Note that we already have txid_current_snapshot() function, which exposes
all that.

Fair enough, and I think that's actually useful for Slony&c. But I
don't think we should shy away of providing a cleaner API here.

Just don't let the perfect get in the way of the good :P

cheers

andrew

#48Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#47)
Re: WIP patch for parallel pg_dump

On Mon, Dec 6, 2010 at 10:35 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 12/06/2010 10:22 AM, Robert Haas wrote:

On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com>  wrote:

On 06.12.2010 15:53, Robert Haas wrote:

I guess.  It still seems far too much like exposing the server's guts
for my taste.  It might not be as bad as the expression tree stuff,
but there's nothing particularly good about it either.

Note that we already have txid_current_snapshot() function, which exposes
all that.

Fair enough, and I think that's actually useful for Slony&c.  But I
don't think we should shy away of providing a cleaner API here.

Just don't let the perfect get in the way of the good :P

I'll keep that in mind. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#44)
Re: WIP patch for parallel pg_dump

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Well, then you need some sort of cross-backend communication, which is
always a bit clumsy.

A temp file seems quite sufficient, and not at all difficult.

"Not at all difficult" is nonsense. To do that, you need to invent some
mechanism for sender and receivers to identify which temp file they want
to use, and you need to think of some way to clean up the files when the
client forgets to tell you to do so. That's going to be at least as
ugly as anything else. And I think it's unproven that this approach
would be security-hole-free either. For instance, what about some other
session overwriting pg_dump's snapshot temp file?

regards, tom lane

#50Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#49)
Re: WIP patch for parallel pg_dump

On Mon, Dec 6, 2010 at 10:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Well, then you need some sort of cross-backend communication, which is
always a bit clumsy.

A temp file seems quite sufficient, and not at all difficult.

"Not at all difficult" is nonsense.  To do that, you need to invent some
mechanism for sender and receivers to identify which temp file they want
to use,

Why is this even remotely hard? That's the whole point of having the
"publish" operation return a token. The token either is, or uniquely
identifies, the file name.

and you need to think of some way to clean up the files when the
client forgets to tell you to do so. That's going to be at least as
ugly as anything else.

Backends don't forget to call their end-of-transaction hooks, do they?
They might crash, but we already have code to remove temp files on
server restart. At most it would need minor adjustment.

 And I think it's unproven that this approach
would be security-hole-free either.  For instance, what about some other
session overwriting pg_dump's snapshot temp file?

Why would this be any different from any other temp file? We surely
must have a mechanism in place to ensure that the temporary files used
by sorts or hash joins don't get overwritten by some other session, or
the system would be totally unstable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#51Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#49)
Re: WIP patch for parallel pg_dump

On 12/06/2010 10:40 AM, Tom Lane wrote:

Robert Haas<robertmhaas@gmail.com> writes:

On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Well, then you need some sort of cross-backend communication, which is
always a bit clumsy.

A temp file seems quite sufficient, and not at all difficult.

"Not at all difficult" is nonsense. To do that, you need to invent some
mechanism for sender and receivers to identify which temp file they want
to use, and you need to think of some way to clean up the files when the
client forgets to tell you to do so. That's going to be at least as
ugly as anything else. And I think it's unproven that this approach
would be security-hole-free either. For instance, what about some other
session overwriting pg_dump's snapshot temp file?

Yeah. I'm still not convinced that using shared memory is a bad way to
pass these around. Surely we're not talking about large numbers of them.
What am I missing here?

cheers

andrew

#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#51)
Re: WIP patch for parallel pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Yeah. I'm still not convinced that using shared memory is a bad way to
pass these around. Surely we're not talking about large numbers of them.
What am I missing here?

They're not of a very predictable size.

Robert's idea of publish() returning a temp file identifier, which then
gets removed at transaction end, might work all right.

regards, tom lane

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#37)
Re: WIP patch for parallel pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Why not just say give me the snapshot currently held by process nnnn?

There's not a unique snapshot held by a particular process. Also, we
don't want to expend the overhead to fully publish every snapshot.
I think it's really necessary that the "sending" process take some
deliberate action to publish a snapshot.

And please, not temp files if possible.

Barring the cleanup issue, I don't see why not. This is a relatively
low-usage feature, I think, so I wouldn't be much in favor of dedicating
shmem to it even if the space requirement were predictable.

regards, tom lane

#54Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#52)
Re: WIP patch for parallel pg_dump

On 12/06/2010 12:28 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

Yeah. I'm still not convinced that using shared memory is a bad way to
pass these around. Surely we're not talking about large numbers of them.
What am I missing here?

They're not of a very predictable size.

Ah. Ok.

cheers

andrew

#55Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#52)
Re: WIP patch for parallel pg_dump

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm still not convinced that using shared memory is a bad way to
pass these around. Surely we're not talking about large numbers
of them. What am I missing here?

They're not of a very predictable size.

Surely you can predict that any snapshot is no larger than a fairly
small fixed portion plus sizeof(TransactionId) * MaxBackends? So,
for example, if you're configured for 100 connections, you'd be
limited to something under 1kB, maximum?

-Kevin

#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#55)
Re: WIP patch for parallel pg_dump

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm still not convinced that using shared memory is a bad way to
pass these around. Surely we're not talking about large numbers
of them. What am I missing here?

They're not of a very predictable size.

Surely you can predict that any snapshot is no larger than a fairly
small fixed portion plus sizeof(TransactionId) * MaxBackends?

No. See subtransactions.

regards, tom lane

#57Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#56)
Re: WIP patch for parallel pg_dump

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Surely you can predict that any snapshot is no larger than a fairly
small fixed portion plus sizeof(TransactionId) * MaxBackends?

No. See subtransactions.

Subtransactions are included in snapshots?

-Kevin

#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#57)
Re: WIP patch for parallel pg_dump

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

No. See subtransactions.

Subtransactions are included in snapshots?

Sure, see GetSnapshotData(). You could avoid it by setting
suboverflowed, but that comes at a nontrivial performance cost.

regards, tom lane

#59Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#58)
Re: WIP patch for parallel pg_dump

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

No. See subtransactions.

Subtransactions are included in snapshots?

Sure, see GetSnapshotData(). You could avoid it by setting
suboverflowed, but that comes at a nontrivial performance cost.

Yeah, sorry for blurting like that before I checked. I was somewhat
panicked that I'd missed something important for SSI, because my
XidIsConcurrent check just uses xmin, xmax, and xip; I was afraid
what I have would fall down in the face of subtransactions. But on
review I found that I'd thought that through and (discussion in in
the archives) I always wanted to associate the locks and conflicts
with the top level transaction; so that was already identified
before checking for overlap, and it was therefore more efficient to
just check that.

Sorry for the "senior moment". :-/

Perhaps a line or two of comments about that in the SSI patch would
be a good idea. And maybe some tests involving subtransactions....

-Kevin

#60marcin mank
marcin.mank@gmail.com
In reply to: Tom Lane (#35)
Re: WIP patch for parallel pg_dump

On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

IIRC, in old discussions of this problem we first considered allowing
clients to pull down an explicit representation of their snapshot (which
actually is an existing feature now, txid_current_snapshot()) and then
upload that again to become the active snapshot in another connection.

Could a hot standby use such a snapshot representation? I.e. same
snapshot on the master and the standby?

Greetings
Marcin Mańk

#61Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: marcin mank (#60)
Re: WIP patch for parallel pg_dump

On 06.12.2010 21:48, marcin mank wrote:

On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

IIRC, in old discussions of this problem we first considered allowing
clients to pull down an explicit representation of their snapshot (which
actually is an existing feature now, txid_current_snapshot()) and then
upload that again to become the active snapshot in another connection.

Could a hot standby use such a snapshot representation? I.e. same
snapshot on the master and the standby?

Hmm, I suppose it could. That's an interesting idea, you could run
parallel pg_dump or something else against master and/or multiple hot
standby servers, all working on the same snapshot.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: marcin mank (#60)
Re: WIP patch for parallel pg_dump

marcin mank <marcin.mank@gmail.com> writes:

On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

IIRC, in old discussions of this problem we first considered allowing
clients to pull down an explicit representation of their snapshot (which
actually is an existing feature now, txid_current_snapshot()) and then
upload that again to become the active snapshot in another connection.

Could a hot standby use such a snapshot representation? I.e. same
snapshot on the master and the standby?

Hm, that's a good question. It seems like it's at least possibly
workable, but I'm not sure if there are any showstoppers. The other
proposal of publish-a-snapshot would presumably NOT support this, since
we'd not want to ship the snapshot temp files down the WAL stream.

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this. When would you really need to be able to do it?

regards, tom lane

#63Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#62)
Re: WIP patch for parallel pg_dump

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this. When would you really need to be able to do it?

If you had several standbys, you could distribute the work of the
pg_dump among them. This would be a huge speedup for a large database,
potentially, thanks to parallelization of I/O and network. Imagine
doing a pg_dump of a 300GB database in 10min.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#64Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#63)
Re: WIP patch for parallel pg_dump

Josh Berkus <josh@agliodbs.com> writes:

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this. When would you really need to be able to do it?

If you had several standbys, you could distribute the work of the
pg_dump among them. This would be a huge speedup for a large database,
potentially, thanks to parallelization of I/O and network. Imagine
doing a pg_dump of a 300GB database in 10min.

That does sound kind of attractive. But to do that I think we'd have to
go with the pass-the-snapshot-through-the-client approach. Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.

While I see Robert's point about preferring not to expose the snapshot
contents to clients, I don't think it outweighs all other considerations
here; and every other one is pointing to doing it the other way.

regards, tom lane

#65Koichi Suzuki
koichi.szk@gmail.com
In reply to: Tom Lane (#62)
Re: WIP patch for parallel pg_dump

We may need other means to ensure that the snapshot is available on
the slave. It could be a bit too early to use the snapshot on the
slave depending upon the delay of WAL replay.
----------
Koichi Suzuki

2010/12/7 Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

marcin mank <marcin.mank@gmail.com> writes:

On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

IIRC, in old discussions of this problem we first considered allowing
clients to pull down an explicit representation of their snapshot (which
actually is an existing feature now, txid_current_snapshot()) and then
upload that again to become the active snapshot in another connection.

Could a hot standby use such a snapshot representation? I.e. same
snapshot on the master and the standby?

Hm, that's a good question.  It seems like it's at least possibly
workable, but I'm not sure if there are any showstoppers.  The other
proposal of publish-a-snapshot would presumably NOT support this, since
we'd not want to ship the snapshot temp files down the WAL stream.

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this.  When would you really need to be able to do it?

                       regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#64)
Re: WIP patch for parallel pg_dump

On 12/07/2010 01:22 AM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this. When would you really need to be able to do it?

If you had several standbys, you could distribute the work of the
pg_dump among them. This would be a huge speedup for a large database,
potentially, thanks to parallelization of I/O and network. Imagine
doing a pg_dump of a 300GB database in 10min.

That does sound kind of attractive. But to do that I think we'd have to
go with the pass-the-snapshot-through-the-client approach. Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.

this kind of functionality would also be very useful/interesting for
connection poolers/loadbalancers that are trying to distribute load
across multiple hosts and could use that to at least give some sort of
consistency guarantee.

Stefan

#67Tatsuo Ishii
ishii@postgresql.org
In reply to: Stefan Kaltenbrunner (#66)
Re: WIP patch for parallel pg_dump

On 12/07/2010 01:22 AM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this. When would you really need to be able to do it?

If you had several standbys, you could distribute the work of the
pg_dump among them. This would be a huge speedup for a large database,
potentially, thanks to parallelization of I/O and network. Imagine
doing a pg_dump of a 300GB database in 10min.

That does sound kind of attractive. But to do that I think we'd have to
go with the pass-the-snapshot-through-the-client approach. Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.

this kind of functionality would also be very useful/interesting for
connection poolers/loadbalancers that are trying to distribute load
across multiple hosts and could use that to at least give some sort of
consistency guarantee.

In addition to this, that will greatly help query based replication
tools such as pgpool-II. Sounds great.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#68Koichi Suzuki
koichi.szk@gmail.com
In reply to: Stefan Kaltenbrunner (#66)
Re: WIP patch for parallel pg_dump

This is what Postgres-XC is doing between a coordinator and a
datanode. Coordinator may correspond to poolers/loadbalancers.
Does anyone think it makes sense to extract XC implementation of
snapshot shipping to PostgreSQL itself?

Cheers;
----------
Koichi Suzuki

2010/12/7 Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>:

Show quoted text

On 12/07/2010 01:22 AM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this.  When would you really need to be able to do it?

If you had several standbys, you could distribute the work of the
pg_dump among them.  This would be a huge speedup for a large database,
potentially, thanks to parallelization of I/O and network.  Imagine
doing a pg_dump of a 300GB database in 10min.

That does sound kind of attractive.  But to do that I think we'd have to
go with the pass-the-snapshot-through-the-client approach.  Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.

this kind of functionality would also be very useful/interesting for
connection poolers/loadbalancers that are trying to distribute load
across multiple hosts and could use that to at least give some sort of
consistency guarantee.

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Koichi Suzuki (#68)
Re: WIP patch for parallel pg_dump

On 12/07/2010 09:23 AM, Koichi Suzuki wrote:

This is what Postgres-XC is doing between a coordinator and a
datanode. Coordinator may correspond to poolers/loadbalancers.
Does anyone think it makes sense to extract XC implementation of
snapshot shipping to PostgreSQL itself?

well if there is a preeceeding implementation of that it would certainly
be of interest to see that - but before you go and extract the code
maybe you could tell us how exactly it works?

Stefan

#70Robert Haas
robertmhaas@gmail.com
In reply to: Koichi Suzuki (#68)
Re: WIP patch for parallel pg_dump

On Tue, Dec 7, 2010 at 3:23 AM, Koichi Suzuki <koichi.szk@gmail.com> wrote:

This is what Postgres-XC is doing between a coordinator and a
datanode.    Coordinator may correspond to poolers/loadbalancers.
Does anyone think it makes sense to extract XC implementation of
snapshot shipping to PostgreSQL itself?

Perhaps, though of course it would need to be re-licensed. I'd be
happy to see us pursue a snapshot cloning framework, wherever it comes
from. I remain unconvinced that it should be made a hard requirement
for parallel pg_dump, but of course if we can get it implemented then
the point becomes moot.

Let's not let this fall on the floor. Someone should pursue this,
whether it's Joachim or Koichi or someone else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#71Koichi Suzuki
koichi.szk@gmail.com
In reply to: Robert Haas (#70)
Re: WIP patch for parallel pg_dump

Robert;

Thank you very much for your advice. Indeed, I'm considering to
change the license to PostgreSQL's one. It may take a bit more
though...
----------
Koichi Suzuki

2010/12/15 Robert Haas <robertmhaas@gmail.com>:

Show quoted text

On Tue, Dec 7, 2010 at 3:23 AM, Koichi Suzuki <koichi.szk@gmail.com> wrote:

This is what Postgres-XC is doing between a coordinator and a
datanode.    Coordinator may correspond to poolers/loadbalancers.
Does anyone think it makes sense to extract XC implementation of
snapshot shipping to PostgreSQL itself?

Perhaps, though of course it would need to be re-licensed.  I'd be
happy to see us pursue a snapshot cloning framework, wherever it comes
from.  I remain unconvinced that it should be made a hard requirement
for parallel pg_dump, but of course if we can get it implemented then
the point becomes moot.

Let's not let this fall on the floor.  Someone should pursue this,
whether it's Joachim or Koichi or someone else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#72Robert Haas
robertmhaas@gmail.com
In reply to: Koichi Suzuki (#71)
Re: WIP patch for parallel pg_dump

On Tue, Dec 14, 2010 at 7:06 PM, Koichi Suzuki <koichi.szk@gmail.com> wrote:

Thank you very much for your advice.   Indeed, I'm considering to
change the license to PostgreSQL's one.   It may take a bit more
though...

You wouldn't necessarily need to relicense all of Postgres-XC
(although that would be cool, too, at least IMO), just the portion you
were proposing for commit to PostgreSQL. Or it doesn't sound like it
would be infeasible for someone to code this up from scratch. But we
should try to make something good happen here!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#73Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#27)
Re: WIP patch for parallel pg_dump

Robert Haas wrote:

I actually think that the phrase "this has been discussed before and
rejected" should be permanently removed from our list of excuses for
rejecting a patch. Or if we must use that excuse, then I think a link
to the relevant discussion is a must, and the relevant discussion had
better reflect the fact that $TOPIC was in fact rejected. It seems to
me that in at least 50% of cases, someone comes back and says one of
the following things:

1. I searched the archives and could find no discussion along those lines.
2. I read that discussion and it doesn't appear to me that it reflects
a rejection of this idea. Instead what people seemed to be saying was
X.
3. At the time that might have been true, but what has changed in the
meanwhile is X.

Agreed. Perhaps we need an anti-TODO that lists things we don't want in
more detail. The TODO has that for a few items, but scaling things up
there will be cumbersome.

I agree that having the person saying it was rejected find the email
discussion is ideal --- if they can't find it, odds are the patch person
will not be able to find it either.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#74Joshua D. Drake
jd@commandprompt.com
In reply to: Bruce Momjian (#73)
Re: WIP patch for parallel pg_dump

anwhile is X.

Agreed. Perhaps we need an anti-TODO that lists things we don't want in
more detail. The TODO has that for a few items, but scaling things up
there will be cumbersome.

Well there is a problem with this too. A good example is hints. A lot of
the community wants hints. A lot of the community doesn't. The community
changes as we get more mature and more hackers. It isn't hard to point
to dozens of items we have now that would have been on that list 5 years
ago.

I agree that having the person saying it was rejected find the email
discussion is ideal --- if they can't find it, odds are the patch person
will not be able to find it either.

I would have to agree here. The idea that we have to search email is bad
enough (issue/bug/feature tracker anyone?) but to have someone say,
search the archives? That is just plain rude and anti-community.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#75Aidan Van Dyk
aidan@highrise.ca
In reply to: Joshua D. Drake (#74)
Re: WIP patch for parallel pg_dump

On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

I would have to agree here. The idea that we have to search email is bad
enough (issue/bug/feature tracker anyone?) but to have someone say,
search the archives? That is just plain rude and anti-community.

Saying "search the bugtracker" is no less rude than "search the archives"...

And most of the bugtrackers I've had to search have way *less*
ease-of-use for searching than a good mailing list archive (I tend to
keep going back to gmane's search)

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

#76Andrew Dunstan
andrew@dunslane.net
In reply to: Aidan Van Dyk (#75)
Re: WIP patch for parallel pg_dump

On 12/24/2010 06:26 PM, Aidan Van Dyk wrote:

On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake<jd@commandprompt.com> wrote:

I would have to agree here. The idea that we have to search email is bad
enough (issue/bug/feature tracker anyone?) but to have someone say,
search the archives? That is just plain rude and anti-community.

Saying "search the bugtracker" is no less rude than "search the archives"...

And most of the bugtrackers I've had to search have way *less*
ease-of-use for searching than a good mailing list archive (I tend to
keep going back to gmane's search)

It's deja vu all over again. See mailing list archives for details.

cheers

andrew

#77Joshua D. Drake
jd@commandprompt.com
In reply to: Aidan Van Dyk (#75)
Re: WIP patch for parallel pg_dump

On Fri, 2010-12-24 at 18:26 -0500, Aidan Van Dyk wrote:

On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

I would have to agree here. The idea that we have to search email is bad
enough (issue/bug/feature tracker anyone?) but to have someone say,
search the archives? That is just plain rude and anti-community.

Saying "search the bugtracker" is no less rude than "search the archives"...

And most of the bugtrackers I've had to search have way *less*
ease-of-use for searching than a good mailing list archive (I tend to
keep going back to gmane's search)

I think you kind of missed my point.

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#78Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#73)
Re: WIP patch for parallel pg_dump

On Dec 24, 2010, at 10:52 AM, Bruce Momjian <bruce@momjian.us> wrote:

Agreed. Perhaps we need an anti-TODO that lists things we don't want in
more detail. The TODO has that for a few items, but scaling things up
there will be cumbersome.

I don't really think that'd be much better. What might be of some value is summaries of previous discussions, *with citations*. Foo seems like it would be useful [1,2,3] but there are concerns about bar [4,5] and baz[6].

...Robert

#79David Fetter
david@fetter.org
In reply to: Andrew Dunstan (#76)
Re: WIP patch for parallel pg_dump

On Fri, Dec 24, 2010 at 06:37:26PM -0500, Andrew Dunstan wrote:

On 12/24/2010 06:26 PM, Aidan Van Dyk wrote:

On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake<jd@commandprompt.com> wrote:

I would have to agree here. The idea that we have to search email
is bad enough (issue/bug/feature tracker anyone?) but to have
someone say, search the archives? That is just plain rude and
anti-community.

Saying "search the bugtracker" is no less rude than "search the
archives"...

And most of the bugtrackers I've had to search have way *less*
ease-of-use for searching than a good mailing list archive (I tend
to keep going back to gmane's search)

It's deja vu all over again. See mailing list archives for details.

LOL!

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#80Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Tom Lane (#64)
Re: WIP patch for parallel pg_dump

On Mon, Dec 6, 2010 at 7:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Josh Berkus <josh@agliodbs.com> writes:

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this. When would you really need to be able to do it?

If you had several standbys, you could distribute the work of the
pg_dump among them. This would be a huge speedup for a large database,
potentially, thanks to parallelization of I/O and network. Imagine
doing a pg_dump of a 300GB database in 10min.

That does sound kind of attractive. But to do that I think we'd have to
go with the pass-the-snapshot-through-the-client approach. Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.

While I see Robert's point about preferring not to expose the snapshot
contents to clients, I don't think it outweighs all other considerations
here; and every other one is pointing to doing it the other way.

How about the publishing transaction puts the snapshot in a (new) system
table and passes a UUID to its children, and the joining transactions looks
for that UUID in the system table using dirty snapshot (SnapshotAny) using a
security-definer function owned by superuser.

No shared memory used, and if WAL-logged, the snapshot would get to the
slaves too.

I realize SnapshotAny wouldn't be sufficient since we want the tuple to
become invisible when the publishing transaction ends (commit/rollback),
hence something akin to (new) HeapTupleSatisfiesStillRunning() would be
needed.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device