pg_dump additional options for performance

simon@2ndquadrant.com

over 17 years ago

In reply to: Stephen Frost (#1)

Re: pg_dump additional options for performance

On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:

Simon,

I agree with adding these options in general, since I find myself
frustrated by having to vi huge dumps to change simple schema things.
A couple of comments on the patch though:

- Conflicting option handling
I think we are doing our users a disservice by putting it on them to
figure out exactly what:
multiple object groups cannot be used together
means to them. You and I may understand what an "object group" is,
and why there can be only one, but it's a great deal less clear than
the prior message of
options -s/--schema-only and -a/--data-only cannot be used together
My suggestion would be to either list out the specific options which
can't be used together, as was done previously, or add a bit of (I
realize, boring) code and actually tell the user which of the
conflicting options were used.

- Documentation
When writing the documentation I would stress that "pre-schema" and
"post-schema" be defined in terms of PostgreSQL objects and why they
are pre vs. post.

- Technically, the patch needs to be updated slightly since another
pg_dump-related patch was committed recently which also added
options and thus causes a conflict.

Beyond those minor points, the patch looks good to me.

Thanks for the review. I'll make the changes you suggest.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

simon@2ndquadrant.com

over 17 years ago

In reply to: Simon Riggs (#2)

1 attachment(s)

Re: pg_dump additional options for performance

On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:

On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:

Simon,

I agree with adding these options in general, since I find myself
frustrated by having to vi huge dumps to change simple schema things.
A couple of comments on the patch though:

- Conflicting option handling
I think we are doing our users a disservice by putting it on them to
figure out exactly what:
multiple object groups cannot be used together
means to them. You and I may understand what an "object group" is,
and why there can be only one, but it's a great deal less clear than
the prior message of
options -s/--schema-only and -a/--data-only cannot be used together
My suggestion would be to either list out the specific options which
can't be used together, as was done previously, or add a bit of (I
realize, boring) code and actually tell the user which of the
conflicting options were used.

- Documentation
When writing the documentation I would stress that "pre-schema" and
"post-schema" be defined in terms of PostgreSQL objects and why they
are pre vs. post.

- Technically, the patch needs to be updated slightly since another
pg_dump-related patch was committed recently which also added
options and thus causes a conflict.

Beyond those minor points, the patch looks good to me.

Thanks for the review. I'll make the changes you suggest.

Patch updated to head, plus changes/docs requested.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

pg_dump_prepost.v3.patchtext/x-patch; charset=utf-8; name=pg_dump_prepost.v3.patchDownload

Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.102
diff -c -r1.102 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml	13 Apr 2008 03:49:21 -0000	1.102
--- doc/src/sgml/ref/pg_dump.sgml	20 Jul 2008 06:33:30 -0000
***************
*** 133,139 ****
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
--- 133,140 ----
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> or <option>--schema-pre-load</> or
! 		<option>--schema-post-load</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
***************
*** 443,448 ****
--- 444,471 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-pre-load</option></term>
+       <listitem>
+        <para>
+ 		Dump only the object definitions (schema) required to load data. Dumps
+ 		exactly what <option>--schema-only</> would dump, but only those
+ 		statements before the data load.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-post-load</option></term>
+       <listitem>
+        <para>
+ 		Dump only the object definitions (schema) required after data has been
+ 		loaded. Dumps exactly what <option>--schema-only</> would dump, but 
+ 		only those statements after the data load.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 774,779 ****
--- 797,830 ----
    </para>
  
    <para>
+    The output of pg_dump can be notionally divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Pre-Schema - objects required before data loading, such as 
+ 	  <command>CREATE TABLE</command>.
+ 	  This part can be requested using <option>--schema-pre-load</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Post-Schema - objects required after data loading, such as
+ 	  <command>ALTER TABLE</command> and <command>CREATE INDEX</command>.
+ 	  This part can be requested using <option>--schema-post-load</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     Because <application>pg_dump</application> is used to transfer data
     to newer versions of <productname>PostgreSQL</>, the output of
     <application>pg_dump</application> can be loaded into
Index: doc/src/sgml/ref/pg_restore.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_restore.sgml,v
retrieving revision 1.75
diff -c -r1.75 pg_restore.sgml
*** doc/src/sgml/ref/pg_restore.sgml	13 Apr 2008 03:49:21 -0000	1.75
--- doc/src/sgml/ref/pg_restore.sgml	20 Jul 2008 06:33:18 -0000
***************
*** 321,326 ****
--- 321,350 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-post-load</option></term>
+       <listitem>
+        <para>
+ 		Dump only the object definitions (schema) required after data has been
+ 		loaded. Dumps exactly what <option>--schema-only</> would dump, but 
+ 		only those statements after the data load.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
+       <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
+       <listitem>
+        <para>
+         Specify the superuser user name to use when disabling triggers.
+         This is only relevant if <option>--disable-triggers</> is used.
+         (Usually, it's better to leave this out, and instead start the
+         resulting script as superuser.)
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 572,577 ****
--- 596,629 ----
    </para>
  
    <para>
+    The actions of pg_restore can be notionally divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Pre-Schema - objects required before data loading, such as 
+ 	  <command>CREATE TABLE</command>.
+ 	  This part can be requested using <option>--schema-pre-load</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Post-Schema - objects required after data loading, such as
+ 	  <command>ALTER TABLE</command> and <command>CREATE INDEX</command>.
+ 	  This part can be requested using <option>--schema-post-load</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     The limitations of <application>pg_restore</application> are detailed below.
  
     <itemizedlist>
Index: src/bin/pg_dump/pg_backup.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** src/bin/pg_dump/pg_backup.h	13 Apr 2008 03:49:21 -0000	1.47
--- src/bin/pg_dump/pg_backup.h	20 Jul 2008 05:19:34 -0000
***************
*** 89,95 ****
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dataOnly;
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
--- 89,95 ----
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dumpObjFlags;	/* which objects types to dump */
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.157
diff -c -r1.157 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c	4 May 2008 08:32:21 -0000	1.157
--- src/bin/pg_dump/pg_backup_archiver.c	20 Jul 2008 05:19:34 -0000
***************
*** 56,62 ****
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
--- 56,62 ----
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
***************
*** 129,135 ****
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	teReqs		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
--- 129,135 ----
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	int		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
***************
*** 175,193 ****
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then we set the dataOnly flag.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same as
! 	 * dataOnly. At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!ropt->dataOnly)
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if ((reqs & REQ_SCHEMA) != 0)
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
--- 175,193 ----
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then say we only want DATA type objects.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same.
! 	 * At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!WANT_DATA(ropt->dumpObjFlags))
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if (WANT_PRE_SCHEMA(reqs) || WANT_POST_SCHEMA(reqs))
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
***************
*** 195,201 ****
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dataOnly = impliedDataOnly;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
--- 195,201 ----
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dumpObjFlags = REQ_DATA;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
***************
*** 236,242 ****
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 236,242 ----
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_PRE_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 278,284 ****
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!ropt->dataOnly && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
--- 278,284 ----
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!WANT_DATA(ropt->dumpObjFlags) && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
***************
*** 286,292 ****
  
  		defnDumped = false;
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
--- 286,293 ----
  
  		defnDumped = false;
  
! 		if ((WANT_PRE_SCHEMA(reqs) && WANT_PRE_SCHEMA(ropt->dumpObjFlags)) ||
! 			(WANT_POST_SCHEMA(reqs) && WANT_POST_SCHEMA(ropt->dumpObjFlags)))	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
***************
*** 331,337 ****
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if ((reqs & REQ_DATA) != 0)
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
--- 332,338 ----
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if (WANT_DATA(reqs))
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
***************
*** 343,349 ****
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
--- 344,350 ----
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && WANT_DATA(reqs))
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
***************
*** 415,421 ****
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
--- 416,422 ----
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if (WANT_PRE_SCHEMA(reqs))	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
***************
*** 473,479 ****
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
--- 474,480 ----
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
***************
*** 499,505 ****
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
--- 500,506 ----
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
***************
*** 1321,1327 ****
  	return NULL;
  }
  
! teReqs
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
--- 1322,1328 ----
  	return NULL;
  }
  
! int
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
***************
*** 2026,2035 ****
  					 te->defn);
  }
  
! static teReqs
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	teReqs		res = REQ_ALL;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
--- 2027,2036 ----
  					 te->defn);
  }
  
! static int
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	int		res = ropt->dumpObjFlags;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
***************
*** 2109,2125 ****
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
- 	/* Mask it if we only want schema */
- 	if (ropt->schemaOnly)
- 		res = res & REQ_SCHEMA;
- 
- 	/* Mask it we only want data */
- 	if (ropt->dataOnly)
- 		res = res & REQ_DATA;
- 
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~REQ_SCHEMA;
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
--- 2110,2118 ----
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~(REQ_PRE_SCHEMA | REQ_POST_SCHEMA);
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.76
diff -c -r1.76 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h	7 Nov 2007 12:24:24 -0000	1.76
--- src/bin/pg_dump/pg_backup_archiver.h	20 Jul 2008 05:19:34 -0000
***************
*** 158,169 ****
  	STAGE_FINALIZING
  } ArchiverStage;
  
! typedef enum
! {
! 	REQ_SCHEMA = 1,
! 	REQ_DATA = 2,
! 	REQ_ALL = REQ_SCHEMA + REQ_DATA
! } teReqs;
  
  typedef struct _archiveHandle
  {
--- 158,173 ----
  	STAGE_FINALIZING
  } ArchiverStage;
  
! #define REQ_PRE_SCHEMA		(1 << 0)
! #define REQ_DATA			(1 << 1)
! #define REQ_POST_SCHEMA		(1 << 2)
! #define REQ_ALL				(REQ_PRE_SCHEMA + REQ_DATA + REQ_POST_SCHEMA)
! 
! #define WANT_PRE_SCHEMA(req)	((req & REQ_PRE_SCHEMA) == REQ_PRE_SCHEMA)
! #define WANT_DATA(req)			((req & REQ_DATA) == REQ_DATA)
! #define WANT_POST_SCHEMA(req)	((req & REQ_POST_SCHEMA) == REQ_POST_SCHEMA)
! #define WANT_ALL(req)			((req & REQ_ALL) == REQ_ALL)
! 
  
  typedef struct _archiveHandle
  {
***************
*** 317,323 ****
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
--- 321,327 ----
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.496
diff -c -r1.496 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	18 Jul 2008 03:32:52 -0000	1.496
--- src/bin/pg_dump/pg_dump.c	20 Jul 2008 05:57:58 -0000
***************
*** 72,77 ****
--- 72,81 ----
  bool		dataOnly;
  bool		aclsSkip;
  
+ /* groups of objects: default is we dump all groups */
+ 
+ int			dumpObjFlags;
+ 
  /* subquery used to convert user ID (eg, datdba) to user name */
  static const char *username_subquery;
  
***************
*** 224,231 ****
  	RestoreOptions *ropt;
  
  	static int	disable_triggers = 0;
! 	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
--- 228,237 ----
  	RestoreOptions *ropt;
  
  	static int	disable_triggers = 0;
! 	static int outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+ 	static int	use_schemaPreLoadOnly;
+ 	static int	use_schemaPostLoadOnly;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
***************
*** 265,270 ****
--- 271,278 ----
  		{"disable-dollar-quoting", no_argument, &disable_dollar_quoting, 1},
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-pre-load", no_argument, &use_schemaPreLoadOnly, 1},
+  		{"schema-post-load", no_argument, &use_schemaPostLoadOnly, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 456,467 ****
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	if (dataOnly && schemaOnly)
  	{
! 		write_msg(NULL, "options -s/--schema-only and -a/--data-only cannot be used together\n");
  		exit(1);
  	}
  
  	if (dataOnly && outputClean)
  	{
  		write_msg(NULL, "options -c/--clean and -a/--data-only cannot be used together\n");
--- 464,504 ----
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	/*
! 	 * Look for conflicting options relating to object groupings
! 	 */
! 	if (schemaOnly && dataOnly)
! 	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				"-s/--schema-only", "-a/--data-only");
! 		exit(1);
! 	}
! 	else if ((schemaOnly || dataOnly) && 
! 				(use_schemaPreLoadOnly == 1 || use_schemaPostLoadOnly == 1))
  	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
! 				use_schemaPostLoadOnly == 1 ? "--schema-post-load" : "--schema-pre-load ");
  		exit(1);
  	}
  
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaPreLoadOnly == 1)
+ 		dumpObjFlags = REQ_PRE_SCHEMA;
+ 
+ 	if (use_schemaPostLoadOnly == 1)
+ 		dumpObjFlags = REQ_POST_SCHEMA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_PRE_SCHEMA | REQ_POST_SCHEMA);
+ 
  	if (dataOnly && outputClean)
  	{
  		write_msg(NULL, "options -c/--clean and -a/--data-only cannot be used together\n");
***************
*** 638,644 ****
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && !schemaOnly)
  		outputBlobs = true;
  
  	/*
--- 675,681 ----
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
  		outputBlobs = true;
  
  	/*
***************
*** 650,656 ****
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (!schemaOnly)
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
--- 687,693 ----
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (WANT_DATA(dumpObjFlags))
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
***************
*** 704,710 ****
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && !dataOnly)
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
--- 741,747 ----
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && WANT_DATA(dumpObjFlags))
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
***************
*** 726,732 ****
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dataOnly = dataOnly;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
--- 763,769 ----
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dumpObjFlags = dumpObjFlags;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
***************
*** 3389,3395 ****
  			continue;
  
  		/* Ignore indexes of tables not to be dumped */
! 		if (!tbinfo->dobj.dump)
  			continue;
  
  		if (g_verbose)
--- 3426,3432 ----
  			continue;
  
  		/* Ignore indexes of tables not to be dumped */
! 		if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  			continue;
  
  		if (g_verbose)
***************
*** 5140,5146 ****
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with catalogId, using table */
--- 5177,5183 ----
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with catalogId, using table */
***************
*** 5191,5197 ****
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with relation, using table */
--- 5228,5234 ----
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with relation, using table */
***************
*** 5543,5549 ****
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || dataOnly)
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
--- 5580,5586 ----
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
***************
*** 5592,5598 ****
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || dataOnly)
  		return;
  
  	/* Dump out in proper style */
--- 5629,5635 ----
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Dump out in proper style */
***************
*** 6237,6243 ****
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 6274,6280 ----
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 6284,6290 ****
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (dataOnly)
  		return false;
  	return true;
  }
--- 6321,6327 ----
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return false;
  	return true;
  }
***************
*** 6305,6311 ****
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (dataOnly)
  		return;
  
  	/*
--- 6342,6348 ----
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 6565,6571 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 6602,6608 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 6960,6966 ****
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (dataOnly)
  		return;
  
  	if (OidIsValid(cast->castfunc))
--- 6997,7003 ----
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	if (OidIsValid(cast->castfunc))
***************
*** 7110,7116 ****
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7147,7153 ----
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7494,7500 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7531,7537 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7802,7808 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7839,7845 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 8071,8077 ****
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8108,8114 ----
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8225,8231 ****
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8262,8268 ----
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8428,8434 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8465,8471 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8497,8503 ****
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8534,8540 ----
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8582,8588 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8619,8625 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8648,8654 ****
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8685,8691 ----
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8784,8790 ****
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (dataOnly || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
--- 8821,8827 ----
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags) || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
***************
*** 8821,8827 ****
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
--- 8858,8864 ----
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (WANT_PRE_SCHEMA(dumpObjFlags))
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
***************
*** 9128,9134 ****
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || dataOnly)
  		return;
  
  	/* Don't print inherited defaults, either */
--- 9165,9171 ----
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Don't print inherited defaults, either */
***************
*** 9213,9219 ****
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9250,9256 ----
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (!WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9282,9288 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9319,9325 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9675,9681 ****
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (!dataOnly)
  	{
  		resetPQExpBuffer(delqry);
  
--- 9712,9718 ----
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (WANT_PRE_SCHEMA(dumpObjFlags))
  	{
  		resetPQExpBuffer(delqry);
  
***************
*** 9778,9784 ****
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (!schemaOnly)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9815,9821 ----
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (WANT_PRE_SCHEMA(dumpObjFlags))
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************
*** 9811,9817 ****
  	const char *p;
  	int			findx;
  
! 	if (dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 9848,9854 ----
  	const char *p;
  	int			findx;
  
! 	if (!WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 10019,10025 ****
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 10056,10062 ----
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c	13 Apr 2008 03:49:22 -0000	1.88
--- src/bin/pg_dump/pg_restore.c	20 Jul 2008 05:57:51 -0000
***************
*** 78,83 ****
--- 78,90 ----
  	static int	no_data_for_failed_tables = 0;
  	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+  	bool		dataOnly = false;
+  	bool		schemaOnly = false;
+  
+  	static int	use_schemaPreLoadOnly;
+  	static int	use_schemaPostLoadOnly;
+  
+  	int			dumpObjFlags;
  
  	struct option cmdopts[] = {
  		{"clean", 0, NULL, 'c'},
***************
*** 114,119 ****
--- 121,128 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"no-data-for-failed-tables", no_argument, &no_data_for_failed_tables, 1},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-pre-load", no_argument, &use_schemaPreLoadOnly, 1},
+  		{"schema-post-load", no_argument, &use_schemaPostLoadOnly, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 145,151 ****
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				opts->dataOnly = 1;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
--- 154,160 ----
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				dataOnly = true;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
***************
*** 213,219 ****
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				opts->schemaOnly = 1;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
--- 222,228 ----
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				schemaOnly = true;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
***************
*** 295,300 ****
--- 304,344 ----
  		opts->useDB = 1;
  	}
  
+ 	/*
+ 	 * Look for conflicting options relating to object groupings
+ 	 */
+ 	if (schemaOnly && dataOnly)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"-s/--schema-only", "-a/--data-only");
+ 		exit(1);
+ 	}
+ 	else if ((schemaOnly || dataOnly) && 
+ 				(use_schemaPreLoadOnly == 1 || use_schemaPostLoadOnly == 1))
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
+ 				use_schemaPostLoadOnly == 1 ? "--schema-post-load" : "--schema-pre-load ");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaPreLoadOnly == 1)
+ 		dumpObjFlags = REQ_PRE_SCHEMA;
+ 
+ 	if (use_schemaPostLoadOnly == 1)
+ 		dumpObjFlags = REQ_POST_SCHEMA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_PRE_SCHEMA | REQ_POST_SCHEMA);
+ 
  	opts->disable_triggers = disable_triggers;
  	opts->noDataForFailedTables = no_data_for_failed_tables;
  	opts->noTablespace = outputNoTablespaces;

sfrost@snowman.net

over 17 years ago

In reply to: Simon Riggs (#3)

Re: pg_dump additional options for performance

Simon,

* Simon Riggs (simon@2ndquadrant.com) wrote:

On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:

On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:

[...]

- Conflicting option handling

Thanks for putting in the extra code to explicitly indicate which
conflicting options were used.

- Documentation
When writing the documentation I would stress that "pre-schema" and
"post-schema" be defined in terms of PostgreSQL objects and why they
are pre vs. post.

Perhaps this is up for some debate, but I find the documentation added
for these options to be lacking the definitions I was looking for, and
the explanation of why they are what they are. I'm also not sure I
agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
really fit with the option names or what they do. Would you consider:

<term><option>--schema-pre-load</option></term>
<listitem>
<para>
Pre-Data Load - Minimum amount of the schema required before data
loading can begin. This consists mainly of creating the tables
using the <command>CREATE TABLE</command>.
This part can be requested using <option>--schema-pre-load</>.
</para>
</listitem>

<term><option>--schema-post-load</option></term>
<listitem>
<para>
Post-Data Load - The rest of the schema definition, including keys,
indexes, etc. By putting keys and indexes after the data has been
loaded the whole process of restoring data is much faster. This is
because it is faster to build indexes and check keys in bulk than
piecemeal as the data is loaded.
This part can be requested using <option>--schema-post-load</>.
</para>
</listitem>

Even this doesn't cover everything though- it's too focused on tables
and data loading. Where do functions go? What about types?

A couple of additional points:

- The command-line help hasn't been updated. Clearly, that also needs
to be done to consider the documentation aspect complete.

- There appears to be a bit of mistakenly included additions. The
patch to pg_restore.sgml attempts to add in documentation for
--superuser. I'm guessing that was unintentional, and looks like
just a mistaken extra copy&paste.

- Technically, the patch needs to be updated slightly since another
pg_dump-related patch was committed recently which also added
options and thus causes a conflict.

I think this might have just happened again, funny enough. It's
something that a committer could perhaps fix, but if you're reworking
the patch anyway...

Thanks,

Stephen

simon@2ndquadrant.com

over 17 years ago

In reply to: Stephen Frost (#4)

Re: pg_dump additional options for performance

On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:

Perhaps this is up for some debate, but I find the documentation added
for these options to be lacking the definitions I was looking for, and
the explanation of why they are what they are. I'm also not sure I
agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
really fit with the option names or what they do. Would you consider:

Will reword.

Even this doesn't cover everything though- it's too focused on tables
and data loading. Where do functions go? What about types?

Yes, it is focused on tables and data loading. What about
functions/types? No relevance here.

- The command-line help hasn't been updated. Clearly, that also needs
to be done to consider the documentation aspect complete.

- There appears to be a bit of mistakenly included additions. The
patch to pg_restore.sgml attempts to add in documentation for
--superuser. I'm guessing that was unintentional, and looks like
just a mistaken extra copy&paste.

Thanks, will do.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

sfrost@snowman.net

over 17 years ago

In reply to: Simon Riggs (#5)

Re: pg_dump additional options for performance

* Simon Riggs (simon@2ndquadrant.com) wrote:

On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:

Even this doesn't cover everything though- it's too focused on tables
and data loading. Where do functions go? What about types?

Yes, it is focused on tables and data loading. What about
functions/types? No relevance here.

I don't see how they're not relevant, it's not like they're being
excluded and in fact they show up in the pre-load output. Heck, even if
they *were* excluded, that should be made clear in the documentation
(either be an explicit include list, or saying they're excluded).

Part of what's driving this is making sure we have a plan for future
objects and where they'll go. Perhaps it would be enough to just say
"pre-load is everything in the schema, except things which are faster
done in bulk (eg: indexes, keys)". I don't think it's right to say
pre-load is "only object definitions required to load data" when it
includes functions and ACLs though.

Hopefully my suggestion and these comments will get us to a happy
middle-ground.

Thanks,

Stephen

daveg

daveg@sonic.net

over 17 years ago

In reply to: Stephen Frost (#6)

Re: pg_dump additional options for performance

On Sun, Jul 20, 2008 at 09:18:29PM -0400, Stephen Frost wrote:

* Simon Riggs (simon@2ndquadrant.com) wrote:

On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:

Even this doesn't cover everything though- it's too focused on tables
and data loading. Where do functions go? What about types?

Yes, it is focused on tables and data loading. What about
functions/types? No relevance here.

I don't see how they're not relevant, it's not like they're being
excluded and in fact they show up in the pre-load output. Heck, even if
they *were* excluded, that should be made clear in the documentation
(either be an explicit include list, or saying they're excluded).

Part of what's driving this is making sure we have a plan for future
objects and where they'll go. Perhaps it would be enough to just say
"pre-load is everything in the schema, except things which are faster
done in bulk (eg: indexes, keys)". I don't think it's right to say
pre-load is "only object definitions required to load data" when it
includes functions and ACLs though.

Hopefully my suggestion and these comments will get us to a happy
middle-ground.

One observation, indexes should be built right after the table data
is loaded for each table, this way, the index build gets a hot cache
for the table data instead of having to re-read it later as we do now.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

sfrost@snowman.net

over 17 years ago

In reply to: daveg (#7)

Re: pg_dump additional options for performance

* daveg (daveg@sonic.net) wrote:

One observation, indexes should be built right after the table data
is loaded for each table, this way, the index build gets a hot cache
for the table data instead of having to re-read it later as we do now.

That's not how pg_dump has traditionally worked, and the point of this
patch is to add options to easily segregate the main pieces of the
existing pg_dump output (main schema definition, data dump, key/index
building). You suggestion brings up an interesting point that should
pg_dump's traditional output structure change the "--schema-post-load"
set of objects wouldn't be as clear to newcomers since the load and the
indexes would be interleaved in the regular output.

I'd be curious about the performance impact this has on an actual load
too. It would probably be more valuable on smaller loads where it would
have less of an impact anyway than on loads larger than the cache size.
Still, not an issue for this patch, imv.

Thanks,

Stephen

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Stephen Frost (#8)

Re: pg_dump additional options for performance

Stephen Frost <sfrost@snowman.net> writes:

* daveg (daveg@sonic.net) wrote:

One observation, indexes should be built right after the table data
is loaded for each table, this way, the index build gets a hot cache
for the table data instead of having to re-read it later as we do now.

That's not how pg_dump has traditionally worked, and the point of this
patch is to add options to easily segregate the main pieces of the
existing pg_dump output (main schema definition, data dump, key/index
building). You suggestion brings up an interesting point that should
pg_dump's traditional output structure change the "--schema-post-load"
set of objects wouldn't be as clear to newcomers since the load and the
indexes would be interleaved in the regular output.

Yeah. Also, that is pushing into an entirely different line of
development, which is to enable multithreaded pg_restore. The patch
at hand is necessarily incompatible with that type of operation, and
wouldn't be used together with it.

As far as the documentation/definition aspect goes, I think it should
just say the parts are
* stuff needed before you can load the data
* the data
* stuff needed after loading the data
and not try to be any more specific than that. There are corner cases
that will turn any simple breakdown into a lie, and I doubt that it's
worth trying to explain them all. (Take a close look at the dependency
loop breaking logic in pg_dump if you doubt this.)

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts. I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not! How about "pre-data and "post-data"?

regards, tom lane

#10

simon@2ndquadrant.com

over 17 years ago

In reply to: Stephen Frost (#6)

Re: pg_dump additional options for performance

On Sun, 2008-07-20 at 21:18 -0400, Stephen Frost wrote:

* Simon Riggs (simon@2ndquadrant.com) wrote:

On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:

Even this doesn't cover everything though- it's too focused on tables
and data loading. Where do functions go? What about types?

Yes, it is focused on tables and data loading. What about
functions/types? No relevance here.

I don't see how they're not relevant, it's not like they're being
excluded and in fact they show up in the pre-load output. Heck, even if
they *were* excluded, that should be made clear in the documentation
(either be an explicit include list, or saying they're excluded).

Part of what's driving this is making sure we have a plan for future
objects and where they'll go. Perhaps it would be enough to just say
"pre-load is everything in the schema, except things which are faster
done in bulk (eg: indexes, keys)". I don't think it's right to say
pre-load is "only object definitions required to load data" when it
includes functions and ACLs though.

Hopefully my suggestion and these comments will get us to a happy
middle-ground.

I don't really understand what you're saying.

The options split the dump into 3 parts that's all: before the load, the
load and after the load.

--schema-pre-load says
"Dumps exactly what <option>--schema-only</> would dump, but only those
statements before the data load."

What is it you are suggesting? I'm unclear.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#11

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#9)

Re: pg_dump additional options for performance

On Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* daveg (daveg@sonic.net) wrote:

One observation, indexes should be built right after the table data
is loaded for each table, this way, the index build gets a hot cache
for the table data instead of having to re-read it later as we do now.

That's not how pg_dump has traditionally worked, and the point of this
patch is to add options to easily segregate the main pieces of the
existing pg_dump output (main schema definition, data dump, key/index
building). You suggestion brings up an interesting point that should
pg_dump's traditional output structure change the "--schema-post-load"
set of objects wouldn't be as clear to newcomers since the load and the
indexes would be interleaved in the regular output.

Stephen: Agreed.

Yeah. Also, that is pushing into an entirely different line of
development, which is to enable multithreaded pg_restore. The patch
at hand is necessarily incompatible with that type of operation, and
wouldn't be used together with it.

As far as the documentation/definition aspect goes, I think it should
just say the parts are
* stuff needed before you can load the data
* the data
* stuff needed after loading the data
and not try to be any more specific than that. There are corner cases
that will turn any simple breakdown into a lie, and I doubt that it's
worth trying to explain them all. (Take a close look at the dependency
loop breaking logic in pg_dump if you doubt this.)

Tom: Agreed.

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts. I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not! How about "pre-data and "post-data"?

OK by me. Any other takers?

I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#12

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#11)

Re: pg_dump additional options for performance

Simon Riggs <simon@2ndquadrant.com> writes:

I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)

Maybe invert the logic?

--omit-pre-data
--omit-data
--omit-post-data

Not wedded to these either, just tossing out an idea...

regards, tom lane

#13

sfrost@snowman.net

over 17 years ago

In reply to: Simon Riggs (#10)

Re: pg_dump additional options for performance

* Simon Riggs (simon@2ndquadrant.com) wrote:

The options split the dump into 3 parts that's all: before the load, the
load and after the load.

--schema-pre-load says
"Dumps exactly what <option>--schema-only</> would dump, but only those
statements before the data load."

What is it you are suggesting? I'm unclear.

That part is fine, the problem is that elsewhere in the documentation
(patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
change it to be "objects required before data loading", which isn't the
same.

Thanks,

Stephen

#14

sfrost@snowman.net

over 17 years ago

In reply to: Tom Lane (#9)

Re: pg_dump additional options for performance

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

As far as the documentation/definition aspect goes, I think it should
just say the parts are
* stuff needed before you can load the data
* the data
* stuff needed after loading the data
and not try to be any more specific than that. There are corner cases
that will turn any simple breakdown into a lie, and I doubt that it's
worth trying to explain them all. (Take a close look at the dependency
loop breaking logic in pg_dump if you doubt this.)

Even that is a lie though, which I guess is what my problem is. It's
really "everything for the schema, except stuff that is better done in
bulk", I believe. Also, I'm a bit concerned about people who would
argue that you need PKs and FKs before you can load the data. Probably
couldn't be avoided tho.

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts. I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not! How about "pre-data and "post-data"?

Argh. The command-line options follow the 'data'/'load' line
(--schema-pre-load and --schema-post-load), and so I think those are
fine. The problem was that in the documentation he switched to saying
they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

Thanks,

Stephen

#15

sfrost@snowman.net

over 17 years ago

In reply to: Simon Riggs (#11)

Re: pg_dump additional options for performance

Simon,

* Simon Riggs (simon@2ndquadrant.com) wrote:

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts. I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not! How about "pre-data and "post-data"?

OK by me. Any other takers?

Having the command-line options be "--schema-pre-data" and
"--schema-post-data" is fine with me. Leaving them the way they are is
also fine by me. It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

Thanks,

Stephen

#16

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Tom Lane (#12)

Re: pg_dump additional options for performance

Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)

Maybe invert the logic?

--omit-pre-data
--omit-data
--omit-post-data

Not wedded to these either, just tossing out an idea...

Please, no. Negative logic seems likely to cause endless confusion.

I'd even be happier with --schema-part-1 and --schema-part-2 if we can't
find some more expressive way of designating them.

cheers

andrew

#17

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Andrew Dunstan (#16)

Re: pg_dump additional options for performance

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

Maybe invert the logic?
--omit-pre-data
--omit-data
--omit-post-data

Please, no. Negative logic seems likely to cause endless confusion.

I think it might actually be less confusing, because with this approach,
each switch has an identifiable default (no) and setting it doesn't
cause side-effects on settings of other switches. The interactions of
the switches as Simon presents 'em seem less than obvious.

regards, tom lane

#18

simon@2ndquadrant.com

over 17 years ago

In reply to: Stephen Frost (#13)

Re: pg_dump additional options for performance

On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote:

* Simon Riggs (simon@2ndquadrant.com) wrote:

The options split the dump into 3 parts that's all: before the load, the
load and after the load.

--schema-pre-load says
"Dumps exactly what <option>--schema-only</> would dump, but only those
statements before the data load."

What is it you are suggesting? I'm unclear.

That part is fine, the problem is that elsewhere in the documentation
(patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
change it to be "objects required before data loading", which isn't the
same.

OK, gotcha now - will change that. I thought you might mean something
about changing the output itself.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#19

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Stephen Frost (#14)

Re: pg_dump additional options for performance

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

As far as the documentation/definition aspect goes, I think it should
just say the parts are
* stuff needed before you can load the data
* the data
* stuff needed after loading the data

Even that is a lie though, which I guess is what my problem is.

True; the stuff done after is done that way at least in part for
performance reasons rather than because it has to be done that way.
(I think it's not only performance issues, though --- for circular
FKs you pretty much have to load the data first.)

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts. I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not! How about "pre-data and "post-data"?

Argh. The command-line options follow the 'data'/'load' line
(--schema-pre-load and --schema-post-load), and so I think those are
fine. The problem was that in the documentation he switched to saying
they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

Ah, I see. No objection to those switch names, at least assuming we
want to stick to positive-logic switches. What did you think of the
negative-logic suggestion (--omit-xxx)?

regards, tom lane

#20

sfrost@snowman.net

over 17 years ago

In reply to: Tom Lane (#19)

Re: pg_dump additional options for performance

Tom, et al,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Ah, I see. No objection to those switch names, at least assuming we
want to stick to positive-logic switches. What did you think of the
negative-logic suggestion (--omit-xxx)?

My preference is for positive-logic switches in general. The place
where I would use this patch would lend itself to being more options if
--omit-xxxx were used. I expect that would hold true for most people.
It would be:

--omit-data --omit-post-load
--omit-pre-load --omit-post-load
--omit-pre-load --omit-data

vs.

--schema-pre-load
--data-only
--schema-post-load

Point being that I'd be dumping these into seperate files where I could
more easily manipulate the pre-load or post-load files. I'd still want
pre/post load to be seperate though since this would be used in cases
where there's alot of data (hence the reason for the split) and putting
pre and post together and running them before data would slow things
down quite a bit.

Are there use cases for just --omit-post-load or --omit-pre-load?
Probably, but I just don't see any situation where I'd use them like
that.

Thanks,

Stephen

#21

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Stephen Frost (#20)

Re: pg_dump additional options for performance

Stephen Frost <sfrost@snowman.net> writes:

Are there use cases for just --omit-post-load or --omit-pre-load?

Probably not many. The thing that's bothering me is the
action-at-a-distance property of the positive-logic switches.
How are we going to explain this?

"By default, --schema-pre-load, --data-only, --schema-post-load
are all ON. But if you turn one of them ON (never mind that
it was already ON by default), that changes the defaults for
the other two to OFF. Then you have to turn them ON (never
mind that the default for them is ON) if you want two out of
the three categories."

You have to bend your mind into a pretzel to wrap it around this
behavior. Yeah, it might be convenient once you understand it,
but how long will it take for the savings in typing to repay the
time to understand it and the mistakes along the way?

regards, tom lane

#22

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#21)

Re: pg_dump additional options for performance

On Mon, 2008-07-21 at 19:19 -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

Are there use cases for just --omit-post-load or --omit-pre-load?

Probably not many. The thing that's bothering me is the
action-at-a-distance property of the positive-logic switches.
How are we going to explain this?

"By default, --schema-pre-load, --data-only, --schema-post-load
are all ON. But if you turn one of them ON (never mind that
it was already ON by default), that changes the defaults for
the other two to OFF. Then you have to turn them ON (never
mind that the default for them is ON) if you want two out of
the three categories."

While I accept your argument a certain amount, --schema-only and
--data-only already behave in the manner you describe. Whether we pick
include or exclude or both, it will make more sense than these existing
options, regrettably.

With regard to the logic, Insert and COPY also behave this way: if you
mention *any* columns then you only get the ones you mention. We manage
to describe that also. An Insert statement would be very confusing if
you had to list the columns you don't want.

So the --omit options seem OK if you assume we'll never add further
options or include additional SQL in the dump. But that seems an
unreliable prop, so I am inclined towards the inclusive approach.

You have to bend your mind into a pretzel to wrap it around this
behavior.

Perhaps my mind was already toroidally challenged? :-}

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#23

simon@2ndquadrant.com

over 17 years ago

In reply to: Stephen Frost (#15)

1 attachment(s)

Re: pg_dump additional options for performance

On Mon, 2008-07-21 at 07:56 -0400, Stephen Frost wrote:

Simon,

* Simon Riggs (simon@2ndquadrant.com) wrote:

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts. I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not! How about "pre-data and "post-data"?

OK by me. Any other takers?

Having the command-line options be "--schema-pre-data" and
"--schema-post-data" is fine with me. Leaving them the way they are is
also fine by me. It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

OK, Mr.Reviewer, sir:

* patch redone using --schema-before-data and --schema-after-data

* docs rewritten using short clear descriptions using only the words
"before" and "after", like in the option names

* all variable names changed

* retested

So prefixes "pre" and "post" no longer appear anywhere. No latin derived
phrases, just good ol' Anglo-Saxon words.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

pg_dump_beforeafter.v4.patchtext/x-patch; charset=UTF-8; name=pg_dump_beforeafter.v4.patchDownload

Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.103
diff -c -r1.103 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml	20 Jul 2008 18:43:30 -0000	1.103
--- doc/src/sgml/ref/pg_dump.sgml	23 Jul 2008 15:26:29 -0000
***************
*** 133,139 ****
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
--- 133,140 ----
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> or <option>--schema-before-data</> or
!         <option>--schema-after-data</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
***************
*** 426,431 ****
--- 427,452 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur before table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur after table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 790,795 ****
--- 811,844 ----
    </para>
  
    <para>
+    The output of <application>pg_dump</application> can be divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     Because <application>pg_dump</application> is used to transfer data
     to newer versions of <productname>PostgreSQL</>, the output of
     <application>pg_dump</application> can be loaded into
Index: doc/src/sgml/ref/pg_restore.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_restore.sgml,v
retrieving revision 1.75
diff -c -r1.75 pg_restore.sgml
*** doc/src/sgml/ref/pg_restore.sgml	13 Apr 2008 03:49:21 -0000	1.75
--- doc/src/sgml/ref/pg_restore.sgml	23 Jul 2008 16:00:29 -0000
***************
*** 321,326 ****
--- 321,346 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur before table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur after table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 572,577 ****
--- 592,626 ----
    </para>
  
    <para>
+    The actions of <application>pg_restore</application> can be 
+    divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     The limitations of <application>pg_restore</application> are detailed below.
  
     <itemizedlist>
Index: src/bin/pg_dump/pg_backup.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** src/bin/pg_dump/pg_backup.h	13 Apr 2008 03:49:21 -0000	1.47
--- src/bin/pg_dump/pg_backup.h	23 Jul 2008 14:33:32 -0000
***************
*** 89,95 ****
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dataOnly;
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
--- 89,95 ----
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dumpObjFlags;	/* which objects types to dump */
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.157
diff -c -r1.157 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c	4 May 2008 08:32:21 -0000	1.157
--- src/bin/pg_dump/pg_backup_archiver.c	23 Jul 2008 16:12:47 -0000
***************
*** 56,62 ****
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
--- 56,62 ----
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
***************
*** 129,135 ****
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	teReqs		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
--- 129,135 ----
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	int		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
***************
*** 175,193 ****
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then we set the dataOnly flag.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same as
! 	 * dataOnly. At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!ropt->dataOnly)
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if ((reqs & REQ_SCHEMA) != 0)
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
--- 175,193 ----
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then say we only want DATA type objects.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same.
! 	 * At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!WANT_DATA(ropt->dumpObjFlags))
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if (WANT_PRE_SCHEMA(reqs) || WANT_POST_SCHEMA(reqs))
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
***************
*** 195,201 ****
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dataOnly = impliedDataOnly;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
--- 195,201 ----
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dumpObjFlags = REQ_DATA;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
***************
*** 236,242 ****
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 236,242 ----
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA_BEFORE_DATA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 278,284 ****
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!ropt->dataOnly && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
--- 278,284 ----
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!WANT_DATA(ropt->dumpObjFlags) && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
***************
*** 286,292 ****
  
  		defnDumped = false;
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
--- 286,293 ----
  
  		defnDumped = false;
  
! 		if ((WANT_PRE_SCHEMA(reqs) && WANT_PRE_SCHEMA(ropt->dumpObjFlags)) ||
! 			(WANT_POST_SCHEMA(reqs) && WANT_POST_SCHEMA(ropt->dumpObjFlags)))	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
***************
*** 331,337 ****
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if ((reqs & REQ_DATA) != 0)
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
--- 332,338 ----
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if (WANT_DATA(reqs))
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
***************
*** 343,349 ****
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
--- 344,350 ----
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && WANT_DATA(reqs))
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
***************
*** 415,421 ****
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
--- 416,422 ----
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if (WANT_PRE_SCHEMA(reqs))	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
***************
*** 473,479 ****
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
--- 474,480 ----
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
***************
*** 499,505 ****
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
--- 500,506 ----
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
***************
*** 1321,1327 ****
  	return NULL;
  }
  
! teReqs
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
--- 1322,1328 ----
  	return NULL;
  }
  
! int
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
***************
*** 2026,2035 ****
  					 te->defn);
  }
  
! static teReqs
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	teReqs		res = REQ_ALL;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
--- 2027,2036 ----
  					 te->defn);
  }
  
! static int
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	int		res = ropt->dumpObjFlags;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
***************
*** 2109,2125 ****
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
- 	/* Mask it if we only want schema */
- 	if (ropt->schemaOnly)
- 		res = res & REQ_SCHEMA;
- 
- 	/* Mask it we only want data */
- 	if (ropt->dataOnly)
- 		res = res & REQ_DATA;
- 
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~REQ_SCHEMA;
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
--- 2110,2118 ----
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~(REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.76
diff -c -r1.76 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h	7 Nov 2007 12:24:24 -0000	1.76
--- src/bin/pg_dump/pg_backup_archiver.h	23 Jul 2008 16:13:23 -0000
***************
*** 158,169 ****
  	STAGE_FINALIZING
  } ArchiverStage;
  
! typedef enum
! {
! 	REQ_SCHEMA = 1,
! 	REQ_DATA = 2,
! 	REQ_ALL = REQ_SCHEMA + REQ_DATA
! } teReqs;
  
  typedef struct _archiveHandle
  {
--- 158,173 ----
  	STAGE_FINALIZING
  } ArchiverStage;
  
! #define REQ_SCHEMA_BEFORE_DATA	(1 << 0)
! #define REQ_DATA				(1 << 1)
! #define REQ_SCHEMA_AFTER_DATA	(1 << 2)
! #define REQ_ALL					(REQ_SCHEMA_BEFORE_DATA + REQ_DATA + REQ_SCHEMA_AFTER_DATA)
! 
! #define WANT_PRE_SCHEMA(req)	((req & REQ_SCHEMA_BEFORE_DATA) == REQ_SCHEMA_BEFORE_DATA)
! #define WANT_DATA(req)			((req & REQ_DATA) == REQ_DATA)
! #define WANT_POST_SCHEMA(req)	((req & REQ_SCHEMA_AFTER_DATA) == REQ_SCHEMA_AFTER_DATA)
! #define WANT_ALL(req)			((req & REQ_ALL) == REQ_ALL)
! 
  
  typedef struct _archiveHandle
  {
***************
*** 317,323 ****
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
--- 321,327 ----
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Jul 2008 18:43:30 -0000	1.497
--- src/bin/pg_dump/pg_dump.c	23 Jul 2008 16:33:08 -0000
***************
*** 73,78 ****
--- 73,82 ----
  bool		aclsSkip;
  const char *lockWaitTimeout;
  
+ /* groups of objects: default is we dump all groups */
+ 
+ int			dumpObjFlags;
+ 
  /* subquery used to convert user ID (eg, datdba) to user name */
  static const char *username_subquery;
  
***************
*** 225,232 ****
  	RestoreOptions *ropt;
  
  	static int	disable_triggers = 0;
! 	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
--- 229,238 ----
  	RestoreOptions *ropt;
  
  	static int	disable_triggers = 0;
! 	static int outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+ 	static int	use_schemaBeforeData;
+ 	static int	use_schemaAfterData;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
***************
*** 267,272 ****
--- 273,280 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"lock-wait-timeout", required_argument, NULL, 2},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &use_schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &use_schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 464,474 ****
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	if (dataOnly && schemaOnly)
  	{
! 		write_msg(NULL, "options -s/--schema-only and -a/--data-only cannot be used together\n");
  		exit(1);
  	}
  
  	if (dataOnly && outputClean)
  	{
--- 472,517 ----
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	/*
! 	 * Look for conflicting options relating to object groupings
! 	 */
! 	if (schemaOnly && dataOnly)
! 	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				"-s/--schema-only", "-a/--data-only");
! 		exit(1);
! 	}
! 	else if ((schemaOnly || dataOnly) && 
! 				(use_schemaBeforeData == 1 || use_schemaAfterData == 1))
  	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
! 				use_schemaBeforeData == 1 ? "--schema-before-data" : "--schema-after-data");
  		exit(1);
  	}
+ 	else if (use_schemaBeforeData == 1 && use_schemaAfterData == 1)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	if (dataOnly && outputClean)
  	{
***************
*** 646,652 ****
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && !schemaOnly)
  		outputBlobs = true;
  
  	/*
--- 689,695 ----
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
  		outputBlobs = true;
  
  	/*
***************
*** 658,664 ****
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (!schemaOnly)
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
--- 701,707 ----
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (WANT_DATA(dumpObjFlags))
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
***************
*** 712,718 ****
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && !dataOnly)
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
--- 755,761 ----
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && WANT_DATA(dumpObjFlags))
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
***************
*** 734,740 ****
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dataOnly = dataOnly;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
--- 777,783 ----
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dumpObjFlags = dumpObjFlags;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
***************
*** 3414,3420 ****
  			continue;
  
  		/* Ignore indexes of tables not to be dumped */
! 		if (!tbinfo->dobj.dump)
  			continue;
  
  		if (g_verbose)
--- 3457,3463 ----
  			continue;
  
  		/* Ignore indexes of tables not to be dumped */
! 		if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  			continue;
  
  		if (g_verbose)
***************
*** 5165,5171 ****
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with catalogId, using table */
--- 5208,5214 ----
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with catalogId, using table */
***************
*** 5216,5222 ****
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with relation, using table */
--- 5259,5265 ----
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with relation, using table */
***************
*** 5568,5574 ****
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || dataOnly)
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
--- 5611,5617 ----
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
***************
*** 5617,5623 ****
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || dataOnly)
  		return;
  
  	/* Dump out in proper style */
--- 5660,5666 ----
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Dump out in proper style */
***************
*** 6262,6268 ****
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 6305,6311 ----
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 6309,6315 ****
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (dataOnly)
  		return false;
  	return true;
  }
--- 6352,6358 ----
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return false;
  	return true;
  }
***************
*** 6330,6336 ****
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (dataOnly)
  		return;
  
  	/*
--- 6373,6379 ----
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 6590,6596 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 6633,6639 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 6985,6991 ****
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (dataOnly)
  		return;
  
  	if (OidIsValid(cast->castfunc))
--- 7028,7034 ----
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	if (OidIsValid(cast->castfunc))
***************
*** 7135,7141 ****
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7178,7184 ----
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7519,7525 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7562,7568 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7827,7833 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7870,7876 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 8096,8102 ****
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8139,8145 ----
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8250,8256 ****
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8293,8299 ----
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8453,8459 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8496,8502 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8522,8528 ****
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8565,8571 ----
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8607,8613 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8650,8656 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8673,8679 ****
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8716,8722 ----
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8809,8815 ****
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (dataOnly || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
--- 8852,8858 ----
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags) || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
***************
*** 8846,8852 ****
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
--- 8889,8895 ----
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (WANT_PRE_SCHEMA(dumpObjFlags))
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
***************
*** 9153,9159 ****
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || dataOnly)
  		return;
  
  	/* Don't print inherited defaults, either */
--- 9196,9202 ----
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Don't print inherited defaults, either */
***************
*** 9238,9244 ****
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9281,9287 ----
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (!WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9307,9313 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9350,9356 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9700,9706 ****
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (!dataOnly)
  	{
  		resetPQExpBuffer(delqry);
  
--- 9743,9749 ----
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (WANT_PRE_SCHEMA(dumpObjFlags))
  	{
  		resetPQExpBuffer(delqry);
  
***************
*** 9803,9809 ****
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (!schemaOnly)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9846,9852 ----
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (WANT_PRE_SCHEMA(dumpObjFlags))
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************
*** 9836,9842 ****
  	const char *p;
  	int			findx;
  
! 	if (dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 9879,9885 ----
  	const char *p;
  	int			findx;
  
! 	if (!WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 10044,10050 ****
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 10087,10093 ----
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c	13 Apr 2008 03:49:22 -0000	1.88
--- src/bin/pg_dump/pg_restore.c	23 Jul 2008 16:24:48 -0000
***************
*** 78,83 ****
--- 78,90 ----
  	static int	no_data_for_failed_tables = 0;
  	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+  	bool		dataOnly = false;
+  	bool		schemaOnly = false;
+  
+  	static int	use_schemaBeforeData;
+  	static int	use_schemaAfterData;
+  
+  	int			dumpObjFlags;
  
  	struct option cmdopts[] = {
  		{"clean", 0, NULL, 'c'},
***************
*** 114,119 ****
--- 121,128 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"no-data-for-failed-tables", no_argument, &no_data_for_failed_tables, 1},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &use_schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &use_schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 145,151 ****
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				opts->dataOnly = 1;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
--- 154,160 ----
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				dataOnly = true;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
***************
*** 213,219 ****
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				opts->schemaOnly = 1;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
--- 222,228 ----
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				schemaOnly = true;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
***************
*** 295,300 ****
--- 304,350 ----
  		opts->useDB = 1;
  	}
  
+ 	/*
+ 	 * Look for conflicting options relating to object groupings
+ 	 */
+ 	if (schemaOnly && dataOnly)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"-s/--schema-only", "-a/--data-only");
+ 		exit(1);
+ 	}
+ 	else if ((schemaOnly || dataOnly) && 
+ 				(use_schemaBeforeData == 1 || use_schemaAfterData == 1))
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
+ 				use_schemaBeforeData == 1 ? "--schema-before-data" : "--schema-after-data");
+ 		exit(1);
+ 	}
+ 	else if (use_schemaBeforeData == 1 && use_schemaAfterData == 1)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
+ 
  	opts->disable_triggers = disable_triggers;
  	opts->noDataForFailedTables = no_data_for_failed_tables;
  	opts->noTablespace = outputNoTablespaces;

#24

simon@2ndquadrant.com

over 17 years ago

In reply to: Simon Riggs (#23)

1 attachment(s)

Re: pg_dump additional options for performance

On Wed, 2008-07-23 at 17:40 +0100, Simon Riggs wrote:

On Mon, 2008-07-21 at 07:56 -0400, Stephen Frost wrote:

Simon,

* Simon Riggs (simon@2ndquadrant.com) wrote:

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts. I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not! How about "pre-data and "post-data"?

OK by me. Any other takers?

Having the command-line options be "--schema-pre-data" and
"--schema-post-data" is fine with me. Leaving them the way they are is
also fine by me. It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

OK, Mr.Reviewer, sir:

* patch redone using --schema-before-data and --schema-after-data

* docs rewritten using short clear descriptions using only the words
"before" and "after", like in the option names

* all variable names changed

* retested

So prefixes "pre" and "post" no longer appear anywhere. No latin derived
phrases, just good ol' Anglo-Saxon words.

...and with command line help also.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

pg_dump_beforeafter.v5.patchtext/x-patch; charset=utf-8; name=pg_dump_beforeafter.v5.patchDownload

Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.103
diff -c -r1.103 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml	20 Jul 2008 18:43:30 -0000	1.103
--- doc/src/sgml/ref/pg_dump.sgml	23 Jul 2008 16:55:05 -0000
***************
*** 133,139 ****
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
--- 133,140 ----
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> or <option>--schema-before-data</> or
!         <option>--schema-after-data</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
***************
*** 426,431 ****
--- 427,452 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur before table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur after table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 790,795 ****
--- 811,844 ----
    </para>
  
    <para>
+    The output of <application>pg_dump</application> can be divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     Because <application>pg_dump</application> is used to transfer data
     to newer versions of <productname>PostgreSQL</>, the output of
     <application>pg_dump</application> can be loaded into
Index: doc/src/sgml/ref/pg_restore.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_restore.sgml,v
retrieving revision 1.75
diff -c -r1.75 pg_restore.sgml
*** doc/src/sgml/ref/pg_restore.sgml	13 Apr 2008 03:49:21 -0000	1.75
--- doc/src/sgml/ref/pg_restore.sgml	23 Jul 2008 16:55:05 -0000
***************
*** 321,326 ****
--- 321,346 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur before table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur after table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 572,577 ****
--- 592,626 ----
    </para>
  
    <para>
+    The actions of <application>pg_restore</application> can be 
+    divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     The limitations of <application>pg_restore</application> are detailed below.
  
     <itemizedlist>
Index: src/bin/pg_dump/pg_backup.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** src/bin/pg_dump/pg_backup.h	13 Apr 2008 03:49:21 -0000	1.47
--- src/bin/pg_dump/pg_backup.h	23 Jul 2008 16:55:05 -0000
***************
*** 89,95 ****
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dataOnly;
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
--- 89,95 ----
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dumpObjFlags;	/* which objects types to dump */
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.157
diff -c -r1.157 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c	4 May 2008 08:32:21 -0000	1.157
--- src/bin/pg_dump/pg_backup_archiver.c	23 Jul 2008 16:55:05 -0000
***************
*** 56,62 ****
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
--- 56,62 ----
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
***************
*** 129,135 ****
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	teReqs		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
--- 129,135 ----
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	int		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
***************
*** 175,193 ****
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then we set the dataOnly flag.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same as
! 	 * dataOnly. At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!ropt->dataOnly)
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if ((reqs & REQ_SCHEMA) != 0)
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
--- 175,193 ----
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then say we only want DATA type objects.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same.
! 	 * At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!WANT_DATA(ropt->dumpObjFlags))
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if (WANT_PRE_SCHEMA(reqs) || WANT_POST_SCHEMA(reqs))
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
***************
*** 195,201 ****
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dataOnly = impliedDataOnly;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
--- 195,201 ----
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dumpObjFlags = REQ_DATA;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
***************
*** 236,242 ****
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 236,242 ----
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA_BEFORE_DATA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 278,284 ****
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!ropt->dataOnly && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
--- 278,284 ----
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!WANT_DATA(ropt->dumpObjFlags) && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
***************
*** 286,292 ****
  
  		defnDumped = false;
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
--- 286,293 ----
  
  		defnDumped = false;
  
! 		if ((WANT_PRE_SCHEMA(reqs) && WANT_PRE_SCHEMA(ropt->dumpObjFlags)) ||
! 			(WANT_POST_SCHEMA(reqs) && WANT_POST_SCHEMA(ropt->dumpObjFlags)))	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
***************
*** 331,337 ****
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if ((reqs & REQ_DATA) != 0)
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
--- 332,338 ----
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if (WANT_DATA(reqs))
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
***************
*** 343,349 ****
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
--- 344,350 ----
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && WANT_DATA(reqs))
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
***************
*** 415,421 ****
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
--- 416,422 ----
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if (WANT_PRE_SCHEMA(reqs))	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
***************
*** 473,479 ****
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
--- 474,480 ----
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
***************
*** 499,505 ****
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
--- 500,506 ----
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
***************
*** 1321,1327 ****
  	return NULL;
  }
  
! teReqs
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
--- 1322,1328 ----
  	return NULL;
  }
  
! int
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
***************
*** 2026,2035 ****
  					 te->defn);
  }
  
! static teReqs
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	teReqs		res = REQ_ALL;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
--- 2027,2036 ----
  					 te->defn);
  }
  
! static int
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	int		res = ropt->dumpObjFlags;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
***************
*** 2109,2125 ****
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
- 	/* Mask it if we only want schema */
- 	if (ropt->schemaOnly)
- 		res = res & REQ_SCHEMA;
- 
- 	/* Mask it we only want data */
- 	if (ropt->dataOnly)
- 		res = res & REQ_DATA;
- 
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~REQ_SCHEMA;
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
--- 2110,2118 ----
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~(REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.76
diff -c -r1.76 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h	7 Nov 2007 12:24:24 -0000	1.76
--- src/bin/pg_dump/pg_backup_archiver.h	23 Jul 2008 16:55:05 -0000
***************
*** 158,169 ****
  	STAGE_FINALIZING
  } ArchiverStage;
  
! typedef enum
! {
! 	REQ_SCHEMA = 1,
! 	REQ_DATA = 2,
! 	REQ_ALL = REQ_SCHEMA + REQ_DATA
! } teReqs;
  
  typedef struct _archiveHandle
  {
--- 158,173 ----
  	STAGE_FINALIZING
  } ArchiverStage;
  
! #define REQ_SCHEMA_BEFORE_DATA	(1 << 0)
! #define REQ_DATA				(1 << 1)
! #define REQ_SCHEMA_AFTER_DATA	(1 << 2)
! #define REQ_ALL					(REQ_SCHEMA_BEFORE_DATA + REQ_DATA + REQ_SCHEMA_AFTER_DATA)
! 
! #define WANT_PRE_SCHEMA(req)	((req & REQ_SCHEMA_BEFORE_DATA) == REQ_SCHEMA_BEFORE_DATA)
! #define WANT_DATA(req)			((req & REQ_DATA) == REQ_DATA)
! #define WANT_POST_SCHEMA(req)	((req & REQ_SCHEMA_AFTER_DATA) == REQ_SCHEMA_AFTER_DATA)
! #define WANT_ALL(req)			((req & REQ_ALL) == REQ_ALL)
! 
  
  typedef struct _archiveHandle
  {
***************
*** 317,323 ****
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
--- 321,327 ----
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Jul 2008 18:43:30 -0000	1.497
--- src/bin/pg_dump/pg_dump.c	23 Jul 2008 17:04:24 -0000
***************
*** 73,78 ****
--- 73,82 ----
  bool		aclsSkip;
  const char *lockWaitTimeout;
  
+ /* groups of objects: default is we dump all groups */
+ 
+ int			dumpObjFlags;
+ 
  /* subquery used to convert user ID (eg, datdba) to user name */
  static const char *username_subquery;
  
***************
*** 225,232 ****
  	RestoreOptions *ropt;
  
  	static int	disable_triggers = 0;
! 	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
--- 229,238 ----
  	RestoreOptions *ropt;
  
  	static int	disable_triggers = 0;
! 	static int outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+ 	static int	use_schemaBeforeData;
+ 	static int	use_schemaAfterData;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
***************
*** 267,272 ****
--- 273,280 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"lock-wait-timeout", required_argument, NULL, 2},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &use_schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &use_schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 464,474 ****
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	if (dataOnly && schemaOnly)
  	{
! 		write_msg(NULL, "options -s/--schema-only and -a/--data-only cannot be used together\n");
  		exit(1);
  	}
  
  	if (dataOnly && outputClean)
  	{
--- 472,517 ----
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	/*
! 	 * Look for conflicting options relating to object groupings
! 	 */
! 	if (schemaOnly && dataOnly)
! 	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				"-s/--schema-only", "-a/--data-only");
! 		exit(1);
! 	}
! 	else if ((schemaOnly || dataOnly) && 
! 				(use_schemaBeforeData == 1 || use_schemaAfterData == 1))
  	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
! 				use_schemaBeforeData == 1 ? "--schema-before-data" : "--schema-after-data");
  		exit(1);
  	}
+ 	else if (use_schemaBeforeData == 1 && use_schemaAfterData == 1)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	if (dataOnly && outputClean)
  	{
***************
*** 646,652 ****
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && !schemaOnly)
  		outputBlobs = true;
  
  	/*
--- 689,695 ----
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
  		outputBlobs = true;
  
  	/*
***************
*** 658,664 ****
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (!schemaOnly)
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
--- 701,707 ----
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (WANT_DATA(dumpObjFlags))
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
***************
*** 712,718 ****
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && !dataOnly)
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
--- 755,761 ----
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && WANT_DATA(dumpObjFlags))
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
***************
*** 734,740 ****
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dataOnly = dataOnly;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
--- 777,783 ----
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dumpObjFlags = dumpObjFlags;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
***************
*** 792,797 ****
--- 835,842 ----
  	printf(_("  --disable-dollar-quoting    disable dollar quoting, use SQL standard quoting\n"));
  	printf(_("  --disable-triggers          disable triggers during data-only restore\n"));
  	printf(_("  --no-tablespaces            do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data		dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data			dump only the part of schema after table data\n"));
  	printf(_("  --use-set-session-authorization\n"
  			 "                              use SESSION AUTHORIZATION commands instead of\n"
  	"                              ALTER OWNER commands to set ownership\n"));
***************
*** 3414,3420 ****
  			continue;
  
  		/* Ignore indexes of tables not to be dumped */
! 		if (!tbinfo->dobj.dump)
  			continue;
  
  		if (g_verbose)
--- 3459,3465 ----
  			continue;
  
  		/* Ignore indexes of tables not to be dumped */
! 		if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  			continue;
  
  		if (g_verbose)
***************
*** 5165,5171 ****
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with catalogId, using table */
--- 5210,5216 ----
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with catalogId, using table */
***************
*** 5216,5222 ****
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with relation, using table */
--- 5261,5267 ----
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with relation, using table */
***************
*** 5568,5574 ****
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || dataOnly)
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
--- 5613,5619 ----
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
***************
*** 5617,5623 ****
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || dataOnly)
  		return;
  
  	/* Dump out in proper style */
--- 5662,5668 ----
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Dump out in proper style */
***************
*** 6262,6268 ****
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 6307,6313 ----
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 6309,6315 ****
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (dataOnly)
  		return false;
  	return true;
  }
--- 6354,6360 ----
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return false;
  	return true;
  }
***************
*** 6330,6336 ****
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (dataOnly)
  		return;
  
  	/*
--- 6375,6381 ----
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 6590,6596 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 6635,6641 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 6985,6991 ****
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (dataOnly)
  		return;
  
  	if (OidIsValid(cast->castfunc))
--- 7030,7036 ----
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	if (OidIsValid(cast->castfunc))
***************
*** 7135,7141 ****
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7180,7186 ----
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7519,7525 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7564,7570 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7827,7833 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7872,7878 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 8096,8102 ****
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8141,8147 ----
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8250,8256 ****
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8295,8301 ----
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8453,8459 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8498,8504 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8522,8528 ****
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8567,8573 ----
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8607,8613 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8652,8658 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8673,8679 ****
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8718,8724 ----
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8809,8815 ****
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (dataOnly || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
--- 8854,8860 ----
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (!WANT_PRE_SCHEMA(dumpObjFlags) || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
***************
*** 8846,8852 ****
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
--- 8891,8897 ----
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (WANT_PRE_SCHEMA(dumpObjFlags))
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
***************
*** 9153,9159 ****
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || dataOnly)
  		return;
  
  	/* Don't print inherited defaults, either */
--- 9198,9204 ----
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || !WANT_PRE_SCHEMA(dumpObjFlags))
  		return;
  
  	/* Don't print inherited defaults, either */
***************
*** 9238,9244 ****
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9283,9289 ----
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (!WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9307,9313 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9352,9358 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9700,9706 ****
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (!dataOnly)
  	{
  		resetPQExpBuffer(delqry);
  
--- 9745,9751 ----
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (WANT_PRE_SCHEMA(dumpObjFlags))
  	{
  		resetPQExpBuffer(delqry);
  
***************
*** 9803,9809 ****
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (!schemaOnly)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9848,9854 ----
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (WANT_PRE_SCHEMA(dumpObjFlags))
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************
*** 9836,9842 ****
  	const char *p;
  	int			findx;
  
! 	if (dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 9881,9887 ----
  	const char *p;
  	int			findx;
  
! 	if (!WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 10044,10050 ****
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 10089,10095 ----
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
  		return;
  
  	/*
Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c	13 Apr 2008 03:49:22 -0000	1.88
--- src/bin/pg_dump/pg_restore.c	23 Jul 2008 17:06:59 -0000
***************
*** 78,83 ****
--- 78,90 ----
  	static int	no_data_for_failed_tables = 0;
  	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+  	bool		dataOnly = false;
+  	bool		schemaOnly = false;
+  
+  	static int	use_schemaBeforeData;
+  	static int	use_schemaAfterData;
+  
+  	int			dumpObjFlags;
  
  	struct option cmdopts[] = {
  		{"clean", 0, NULL, 'c'},
***************
*** 114,119 ****
--- 121,128 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"no-data-for-failed-tables", no_argument, &no_data_for_failed_tables, 1},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &use_schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &use_schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 145,151 ****
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				opts->dataOnly = 1;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
--- 154,160 ----
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				dataOnly = true;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
***************
*** 213,219 ****
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				opts->schemaOnly = 1;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
--- 222,228 ----
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				schemaOnly = true;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
***************
*** 295,300 ****
--- 304,350 ----
  		opts->useDB = 1;
  	}
  
+ 	/*
+ 	 * Look for conflicting options relating to object groupings
+ 	 */
+ 	if (schemaOnly && dataOnly)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"-s/--schema-only", "-a/--data-only");
+ 		exit(1);
+ 	}
+ 	else if ((schemaOnly || dataOnly) && 
+ 				(use_schemaBeforeData == 1 || use_schemaAfterData == 1))
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
+ 				use_schemaBeforeData == 1 ? "--schema-before-data" : "--schema-after-data");
+ 		exit(1);
+ 	}
+ 	else if (use_schemaBeforeData == 1 && use_schemaAfterData == 1)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
+ 
  	opts->disable_triggers = disable_triggers;
  	opts->noDataForFailedTables = no_data_for_failed_tables;
  	opts->noTablespace = outputNoTablespaces;
***************
*** 405,410 ****
--- 455,462 ----
  			 "                           do not restore data of tables that could not be\n"
  			 "                           created\n"));
  	printf(_("  --no-tablespaces         do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data	 dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data		 dump only the part of schema after table data\n"));
  	printf(_("  --use-set-session-authorization\n"
  			 "                           use SESSION AUTHORIZATION commands instead of\n"
  			 "                           OWNER TO commands\n"));

#25

sfrost@snowman.net

over 17 years ago

In reply to: Simon Riggs (#24)

Re: pg_dump additional options for performance

Simon,

* Simon Riggs (simon@2ndquadrant.com) wrote:

...and with command line help also.

The documentation and whatnot looks good to me now. There are a couple
of other issues I found while looking through and testing the patch
though-

Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Jul 2008 18:43:30 -0000	1.497
--- src/bin/pg_dump/pg_dump.c	23 Jul 2008 17:04:24 -0000
***************
*** 225,232 ****
  	RestoreOptions *ropt;

static int disable_triggers = 0;
! static int outputNoTablespaces = 0;
static int use_setsessauth = 0;

  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
--- 229,238 ----
  	RestoreOptions *ropt;

static int disable_triggers = 0;
! static int outputNoTablespaces = 0;
static int use_setsessauth = 0;
+ static int use_schemaBeforeData;
+ static int use_schemaAfterData;

static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
***************

This hunk appears to have a bit of gratuitous whitespace change, not a
big deal tho.

***************
*** 464,474 ****
[...]
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
***************

It wouldn't kill to be consistant between testing for '== 1' and just
checking for non-zero. Again, not really a big deal, and I wouldn't
mention these if there weren't other issues.

***************
*** 646,652 ****
* Dumping blobs is now default unless we saw an inclusion switch or -s
* ... but even if we did see one of these, -b turns it back on.
*/
! if (include_everything && !schemaOnly)
outputBlobs = true;

  	/*
--- 689,695 ----
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
  		outputBlobs = true;

/*
***************

Shouldn't this change be to "WANT_DATA(dumpObjFlags)"? That's what most
of the '!schemaOnly' get translated to. Otherwise I think you would be
getting blobs when you've asked for just schema-before-data, which
doesn't seem like it'd make much sense.

***************
*** 712,718 ****
dumpStdStrings(g_fout);

/* The database item is always next, unless we don't want it at all */
! if (include_everything && !dataOnly)
dumpDatabase(g_fout);

  	/* Now the rearrangeable objects. */
--- 755,761 ----
  	dumpStdStrings(g_fout);

/* The database item is always next, unless we don't want it at all */
! if (include_everything && WANT_DATA(dumpObjFlags))
dumpDatabase(g_fout);

/* Now the rearrangeable objects. */
***************

Shouldn't this be 'WANT_PRE_SCHEMA(dumpObjFlags)'?

***************
*** 3414,3420 ****
continue;

/* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump)
continue;

  		if (g_verbose)
--- 3459,3465 ----
  			continue;

/* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
continue;

if (g_verbose)
***************

I didn't test this, but it strikes me as an unnecessary addition? If
anything, wouldn't this check make more sense being done right after
dropping into getIndexes()? No sense going through the loop just for
fun.. Technically, it's a behavioral change for --data-only since it
used to gather index information anyway, but it's a good optimization if
done in the right place.

Also around here, there doesn't appear to be any checking in
dumpEnumType(), which strikes me as odd. Wouldn't that deserve a

if (!WANT_PRE_SCHEMA(dumpObjFlags))
return;

check? If not even some kind of equivilant ->dobj.dump check..

***************
*** 9803,9809 ****
tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
}

! 	if (!schemaOnly)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9848,9854 ----
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}

! if (WANT_PRE_SCHEMA(dumpObjFlags))
{
resetPQExpBuffer(query);
appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************

This is a mistaken logic invert, which results in setval's not being
dumped at all when pulling out each piece seperately. It should be:

if (WANT_DATA(dumpObjFlags))

so that setval's are correctly included on the --data-only piece. As
--data-only previously existed, this would be a regression too.

Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c	13 Apr 2008 03:49:22 -0000	1.88
--- src/bin/pg_dump/pg_restore.c	23 Jul 2008 17:06:59 -0000
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
+ 
***************

Ditto previous comment on this, but in pg_restore.c.

***************
*** 405,410 ****
--- 455,462 ----
  			 "                           do not restore data of tables that could not be\n"
  			 "                           created\n"));
  	printf(_("  --no-tablespaces         do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data	 dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data		 dump only the part of schema after table data\n"));
  	printf(_("  --use-set-session-authorization\n"
  			 "                           use SESSION AUTHORIZATION commands instead of\n"
  			 "                           OWNER TO commands\n"));
***************

Forgot to mention this on pg_dump.c, but in both pg_dump and pg_restore,
and I hate to be the bearer of bad news, but your command-line
documentation doesn't line up properly in the output. You shouldn't be
using tabs there but instead should use spaces as the other help text
does, so everything lines up nicely.

Thanks,

Stephen

#26

simon@2ndquadrant.com

over 17 years ago

In reply to: Stephen Frost (#25)

1 attachment(s)

Re: pg_dump additional options for performance

On Wed, 2008-07-23 at 23:20 -0400, Stephen Frost wrote:

* Simon Riggs (simon@2ndquadrant.com) wrote:

...and with command line help also.

The documentation and whatnot looks good to me now. There are a couple
of other issues I found while looking through and testing the patch
though-

Thanks for a good review.

Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Jul 2008 18:43:30 -0000	1.497
--- src/bin/pg_dump/pg_dump.c	23 Jul 2008 17:04:24 -0000
***************
*** 225,232 ****
RestoreOptions *ropt;
static int disable_triggers = 0;
! static int outputNoTablespaces = 0;
static int use_setsessauth = 0;
static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
--- 229,238 ----
RestoreOptions *ropt;
static int disable_triggers = 0;
! static int outputNoTablespaces = 0;
static int use_setsessauth = 0;
+ static int use_schemaBeforeData;
+ static int use_schemaAfterData;

static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
***************

This hunk appears to have a bit of gratuitous whitespace change, not a
big deal tho.

***************
*** 464,474 ****
[...]
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
***************
It wouldn't kill to be consistant between testing for '== 1' and just
checking for non-zero. Again, not really a big deal, and I wouldn't
mention these if there weren't other issues.

***************
*** 646,652 ****
* Dumping blobs is now default unless we saw an inclusion switch or -s
* ... but even if we did see one of these, -b turns it back on.
*/
! if (include_everything && !schemaOnly)
outputBlobs = true;
/*
--- 689,695 ----
* Dumping blobs is now default unless we saw an inclusion switch or -s
* ... but even if we did see one of these, -b turns it back on.
*/
! 	if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
outputBlobs = true;
/*
***************

Shouldn't this change be to "WANT_DATA(dumpObjFlags)"? That's what most
of the '!schemaOnly' get translated to. Otherwise I think you would be
getting blobs when you've asked for just schema-before-data, which
doesn't seem like it'd make much sense.

Yes, fixed

***************
*** 712,718 ****
dumpStdStrings(g_fout);

/* The database item is always next, unless we don't want it at all */
! if (include_everything && !dataOnly)
dumpDatabase(g_fout);
/* Now the rearrangeable objects. */
--- 755,761 ----
dumpStdStrings(g_fout);
/* The database item is always next, unless we don't want it at all */
! if (include_everything && WANT_DATA(dumpObjFlags))
dumpDatabase(g_fout);

/* Now the rearrangeable objects. */
***************

Shouldn't this be 'WANT_PRE_SCHEMA(dumpObjFlags)'?

Yes, fixed

***************
*** 3414,3420 ****
continue;

/* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump)
continue;
if (g_verbose)
--- 3459,3465 ----
continue;
/* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
continue;

if (g_verbose)
***************

I didn't test this, but it strikes me as an unnecessary addition? If
anything, wouldn't this check make more sense being done right after
dropping into getIndexes()? No sense going through the loop just for
fun.. Technically, it's a behavioral change for --data-only since it
used to gather index information anyway, but it's a good optimization if
done in the right place.

Agreed. I've just removed this. Patch not about optimising logic.

Also around here, there doesn't appear to be any checking in
dumpEnumType(), which strikes me as odd. Wouldn't that deserve a

if (!WANT_PRE_SCHEMA(dumpObjFlags))
return;

check? If not even some kind of equivilant ->dobj.dump check..

Agreed. That appears to be a bug in pg_dump, since this would currently
dump enums if --data-only was specified, which in my understanding would
be wrong.

Have included this:

/* Skip if not to be dumped */
if (!tinfo->dobj.dump || !WANT_BEFORE_SCHEMA(dumpObjFlags))
return;

***************
*** 9803,9809 ****
tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
}
! 	if (!schemaOnly)
{
resetPQExpBuffer(query);
appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9848,9854 ----
tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
}
! if (WANT_PRE_SCHEMA(dumpObjFlags))
{
resetPQExpBuffer(query);
appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************

This is a mistaken logic invert, which results in setval's not being
dumped at all when pulling out each piece seperately. It should be:

if (WANT_DATA(dumpObjFlags))

so that setval's are correctly included on the --data-only piece. As
--data-only previously existed, this would be a regression too.

OK, fixed.

Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c	13 Apr 2008 03:49:22 -0000	1.88
--- src/bin/pg_dump/pg_restore.c	23 Jul 2008 17:06:59 -0000
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (use_schemaBeforeData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (use_schemaAfterData == 1)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
+ 
***************

Ditto previous comment on this, but in pg_restore.c.

***************
*** 405,410 ****
--- 455,462 ----
"                           do not restore data of tables that could not be\n"
"                           created\n"));
printf(_("  --no-tablespaces         do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data	 dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data		 dump only the part of schema after table data\n"));
printf(_("  --use-set-session-authorization\n"
"                           use SESSION AUTHORIZATION commands instead of\n"
"                           OWNER TO commands\n"));
***************
Forgot to mention this on pg_dump.c, but in both pg_dump and pg_restore,
and I hate to be the bearer of bad news, but your command-line
documentation doesn't line up properly in the output. You shouldn't be
using tabs there but instead should use spaces as the other help text
does, so everything lines up nicely.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

pg_dump_beforeafter.v6.patchtext/x-patch; charset=utf-8; name=pg_dump_beforeafter.v6.patchDownload

Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.103
diff -c -r1.103 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml	20 Jul 2008 18:43:30 -0000	1.103
--- doc/src/sgml/ref/pg_dump.sgml	24 Jul 2008 07:30:19 -0000
***************
*** 133,139 ****
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
--- 133,140 ----
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> or <option>--schema-before-data</> or
!         <option>--schema-after-data</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
***************
*** 426,431 ****
--- 427,452 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur before table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur after table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 790,795 ****
--- 811,844 ----
    </para>
  
    <para>
+    The output of <application>pg_dump</application> can be divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     Because <application>pg_dump</application> is used to transfer data
     to newer versions of <productname>PostgreSQL</>, the output of
     <application>pg_dump</application> can be loaded into
Index: doc/src/sgml/ref/pg_restore.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_restore.sgml,v
retrieving revision 1.75
diff -c -r1.75 pg_restore.sgml
*** doc/src/sgml/ref/pg_restore.sgml	13 Apr 2008 03:49:21 -0000	1.75
--- doc/src/sgml/ref/pg_restore.sgml	24 Jul 2008 07:30:19 -0000
***************
*** 321,326 ****
--- 321,346 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur before table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur after table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 572,577 ****
--- 592,626 ----
    </para>
  
    <para>
+    The actions of <application>pg_restore</application> can be 
+    divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     The limitations of <application>pg_restore</application> are detailed below.
  
     <itemizedlist>
Index: src/bin/pg_dump/pg_backup.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** src/bin/pg_dump/pg_backup.h	13 Apr 2008 03:49:21 -0000	1.47
--- src/bin/pg_dump/pg_backup.h	24 Jul 2008 07:30:19 -0000
***************
*** 89,95 ****
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dataOnly;
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
--- 89,95 ----
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dumpObjFlags;	/* which objects types to dump */
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.157
diff -c -r1.157 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c	4 May 2008 08:32:21 -0000	1.157
--- src/bin/pg_dump/pg_backup_archiver.c	24 Jul 2008 07:30:19 -0000
***************
*** 56,62 ****
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
--- 56,62 ----
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
***************
*** 129,135 ****
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	teReqs		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
--- 129,135 ----
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	int		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
***************
*** 175,193 ****
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then we set the dataOnly flag.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same as
! 	 * dataOnly. At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!ropt->dataOnly)
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if ((reqs & REQ_SCHEMA) != 0)
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
--- 175,193 ----
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then say we only want DATA type objects.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same.
! 	 * At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!WANT_DATA(ropt->dumpObjFlags))
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if (WANT_SCHEMA_BEFORE_DATA(reqs) || WANT_SCHEMA_AFTER_DATA(reqs))
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
***************
*** 195,201 ****
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dataOnly = impliedDataOnly;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
--- 195,201 ----
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dumpObjFlags = REQ_DATA;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
***************
*** 236,242 ****
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 236,242 ----
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA_BEFORE_DATA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 278,284 ****
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!ropt->dataOnly && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
--- 278,284 ----
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!WANT_DATA(ropt->dumpObjFlags) && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
***************
*** 286,292 ****
  
  		defnDumped = false;
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
--- 286,293 ----
  
  		defnDumped = false;
  
! 		if ((WANT_SCHEMA_BEFORE_DATA(reqs) && WANT_SCHEMA_BEFORE_DATA(ropt->dumpObjFlags)) ||
! 			(WANT_SCHEMA_AFTER_DATA(reqs) && WANT_SCHEMA_AFTER_DATA(ropt->dumpObjFlags)))	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
***************
*** 331,337 ****
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if ((reqs & REQ_DATA) != 0)
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
--- 332,338 ----
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if (WANT_DATA(reqs))
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
***************
*** 343,349 ****
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
--- 344,350 ----
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && WANT_DATA(reqs))
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
***************
*** 415,421 ****
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
--- 416,422 ----
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if (WANT_SCHEMA_BEFORE_DATA(reqs))	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
***************
*** 473,479 ****
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
--- 474,480 ----
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
***************
*** 499,505 ****
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
--- 500,506 ----
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
***************
*** 1321,1327 ****
  	return NULL;
  }
  
! teReqs
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
--- 1322,1328 ----
  	return NULL;
  }
  
! int
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
***************
*** 2026,2035 ****
  					 te->defn);
  }
  
! static teReqs
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	teReqs		res = REQ_ALL;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
--- 2027,2036 ----
  					 te->defn);
  }
  
! static int
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	int		res = ropt->dumpObjFlags;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
***************
*** 2109,2125 ****
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
- 	/* Mask it if we only want schema */
- 	if (ropt->schemaOnly)
- 		res = res & REQ_SCHEMA;
- 
- 	/* Mask it we only want data */
- 	if (ropt->dataOnly)
- 		res = res & REQ_DATA;
- 
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~REQ_SCHEMA;
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
--- 2110,2118 ----
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~(REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.76
diff -c -r1.76 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h	7 Nov 2007 12:24:24 -0000	1.76
--- src/bin/pg_dump/pg_backup_archiver.h	24 Jul 2008 07:30:19 -0000
***************
*** 158,169 ****
  	STAGE_FINALIZING
  } ArchiverStage;
  
! typedef enum
! {
! 	REQ_SCHEMA = 1,
! 	REQ_DATA = 2,
! 	REQ_ALL = REQ_SCHEMA + REQ_DATA
! } teReqs;
  
  typedef struct _archiveHandle
  {
--- 158,173 ----
  	STAGE_FINALIZING
  } ArchiverStage;
  
! #define REQ_SCHEMA_BEFORE_DATA	(1 << 0)
! #define REQ_DATA				(1 << 1)
! #define REQ_SCHEMA_AFTER_DATA	(1 << 2)
! #define REQ_ALL					(REQ_SCHEMA_BEFORE_DATA + REQ_DATA + REQ_SCHEMA_AFTER_DATA)
! 
! #define WANT_SCHEMA_BEFORE_DATA(req)	((req & REQ_SCHEMA_BEFORE_DATA) == REQ_SCHEMA_BEFORE_DATA)
! #define WANT_DATA(req)					((req & REQ_DATA) == REQ_DATA)
! #define WANT_SCHEMA_AFTER_DATA(req)		((req & REQ_SCHEMA_AFTER_DATA) == REQ_SCHEMA_AFTER_DATA)
! #define WANT_ALL(req)					((req & REQ_ALL) == REQ_ALL)
! 
  
  typedef struct _archiveHandle
  {
***************
*** 317,323 ****
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
--- 321,327 ----
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Jul 2008 18:43:30 -0000	1.497
--- src/bin/pg_dump/pg_dump.c	24 Jul 2008 07:35:28 -0000
***************
*** 73,78 ****
--- 73,82 ----
  bool		aclsSkip;
  const char *lockWaitTimeout;
  
+ /* groups of objects: default is we dump all groups */
+ 
+ int			dumpObjFlags;
+ 
  /* subquery used to convert user ID (eg, datdba) to user name */
  static const char *username_subquery;
  
***************
*** 227,232 ****
--- 231,238 ----
  	static int	disable_triggers = 0;
  	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+ 	static int	schemaBeforeData;
+ 	static int	schemaAfterData;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
***************
*** 267,272 ****
--- 273,280 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"lock-wait-timeout", required_argument, NULL, 2},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 420,425 ****
--- 428,437 ----
  					disable_triggers = 1;
  				else if (strcmp(optarg, "no-tablespaces") == 0)
  					outputNoTablespaces = 1;
+ 				else if (strcmp(optarg, "schema-before-data") == 0)
+ 					schemaBeforeData = 1;
+ 				else if (strcmp(optarg, "schema-after-data") == 0)
+ 					schemaAfterData = 1;
  				else if (strcmp(optarg, "use-set-session-authorization") == 0)
  					use_setsessauth = 1;
  				else
***************
*** 464,474 ****
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	if (dataOnly && schemaOnly)
  	{
! 		write_msg(NULL, "options -s/--schema-only and -a/--data-only cannot be used together\n");
  		exit(1);
  	}
  
  	if (dataOnly && outputClean)
  	{
--- 476,521 ----
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	/*
! 	 * Look for conflicting options relating to object groupings
! 	 */
! 	if (schemaOnly && dataOnly)
! 	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				"-s/--schema-only", "-a/--data-only");
! 		exit(1);
! 	}
! 	else if ((schemaOnly || dataOnly) && 
! 				(schemaBeforeData || schemaAfterData))
  	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
! 				schemaBeforeData ? "--schema-before-data" : "--schema-after-data");
  		exit(1);
  	}
+ 	else if (schemaBeforeData && schemaAfterData)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (schemaBeforeData)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (schemaAfterData)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	if (dataOnly && outputClean)
  	{
***************
*** 646,652 ****
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && !schemaOnly)
  		outputBlobs = true;
  
  	/*
--- 693,699 ----
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && WANT_DATA(dumpObjFlags))
  		outputBlobs = true;
  
  	/*
***************
*** 658,664 ****
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (!schemaOnly)
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
--- 705,711 ----
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (WANT_DATA(dumpObjFlags))
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
***************
*** 712,718 ****
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && !dataOnly)
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
--- 759,765 ----
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
***************
*** 734,740 ****
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dataOnly = dataOnly;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
--- 781,787 ----
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dumpObjFlags = dumpObjFlags;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
***************
*** 792,797 ****
--- 839,846 ----
  	printf(_("  --disable-dollar-quoting    disable dollar quoting, use SQL standard quoting\n"));
  	printf(_("  --disable-triggers          disable triggers during data-only restore\n"));
  	printf(_("  --no-tablespaces            do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data        dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data         dump only the part of schema after table data\n"));
  	printf(_("  --use-set-session-authorization\n"
  			 "                              use SESSION AUTHORIZATION commands instead of\n"
  	"                              ALTER OWNER commands to set ownership\n"));
***************
*** 5165,5171 ****
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with catalogId, using table */
--- 5214,5220 ----
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with catalogId, using table */
***************
*** 5216,5222 ****
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with relation, using table */
--- 5265,5271 ----
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with relation, using table */
***************
*** 5568,5574 ****
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || dataOnly)
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
--- 5617,5623 ----
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
***************
*** 5617,5623 ****
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || dataOnly)
  		return;
  
  	/* Dump out in proper style */
--- 5666,5672 ----
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Dump out in proper style */
***************
*** 5646,5651 ****
--- 5695,5704 ----
  				i;
  	char	   *label;
  
+ 	/* Skip if not to be dumped */
+ 	if (!tinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
+ 		return;
+ 
  	/* Set proper schema search path so regproc references list correctly */
  	selectSourceSchema(tinfo->dobj.namespace->dobj.name);
  
***************
*** 6262,6268 ****
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 6315,6321 ----
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 6309,6315 ****
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (dataOnly)
  		return false;
  	return true;
  }
--- 6362,6368 ----
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return false;
  	return true;
  }
***************
*** 6330,6336 ****
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (dataOnly)
  		return;
  
  	/*
--- 6383,6389 ----
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 6590,6596 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 6643,6649 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 6985,6991 ****
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (dataOnly)
  		return;
  
  	if (OidIsValid(cast->castfunc))
--- 7038,7044 ----
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	if (OidIsValid(cast->castfunc))
***************
*** 7135,7141 ****
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7188,7194 ----
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7519,7525 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7572,7578 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7827,7833 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7880,7886 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 8096,8102 ****
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8149,8155 ----
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8250,8256 ****
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8303,8309 ----
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8453,8459 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8506,8512 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8522,8528 ****
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8575,8581 ----
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8607,8613 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8660,8666 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8673,8679 ****
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8726,8732 ----
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8809,8815 ****
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (dataOnly || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
--- 8862,8868 ----
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags) || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
***************
*** 8846,8852 ****
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
--- 8899,8905 ----
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
***************
*** 9153,9159 ****
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || dataOnly)
  		return;
  
  	/* Don't print inherited defaults, either */
--- 9206,9212 ----
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Don't print inherited defaults, either */
***************
*** 9238,9244 ****
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9291,9297 ----
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (!WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9307,9313 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9360,9366 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || !WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9700,9706 ****
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (!dataOnly)
  	{
  		resetPQExpBuffer(delqry);
  
--- 9753,9759 ----
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  	{
  		resetPQExpBuffer(delqry);
  
***************
*** 9803,9809 ****
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (!schemaOnly)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9856,9862 ----
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (WANT_DATA(dumpObjFlags))
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************
*** 9836,9842 ****
  	const char *p;
  	int			findx;
  
! 	if (dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 9889,9895 ----
  	const char *p;
  	int			findx;
  
! 	if (!WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 10044,10050 ****
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 10097,10103 ----
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || !WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	/*
Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c	13 Apr 2008 03:49:22 -0000	1.88
--- src/bin/pg_dump/pg_restore.c	24 Jul 2008 07:30:19 -0000
***************
*** 78,83 ****
--- 78,90 ----
  	static int	no_data_for_failed_tables = 0;
  	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+  	bool		dataOnly = false;
+  	bool		schemaOnly = false;
+  
+  	static int	schemaBeforeData;
+  	static int	schemaAfterData;
+  
+  	int			dumpObjFlags;
  
  	struct option cmdopts[] = {
  		{"clean", 0, NULL, 'c'},
***************
*** 114,119 ****
--- 121,128 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"no-data-for-failed-tables", no_argument, &no_data_for_failed_tables, 1},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 145,151 ****
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				opts->dataOnly = 1;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
--- 154,160 ----
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				dataOnly = true;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
***************
*** 213,219 ****
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				opts->schemaOnly = 1;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
--- 222,228 ----
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				schemaOnly = true;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
***************
*** 249,254 ****
--- 258,267 ----
  					no_data_for_failed_tables = 1;
  				else if (strcmp(optarg, "no-tablespaces") == 0)
  					outputNoTablespaces = 1;
+ 				else if (strcmp(optarg, "schema-before-data") == 0)
+ 					schemaBeforeData = 1;
+ 				else if (strcmp(optarg, "schema-after-data") == 0)
+ 					schemaAfterData = 1;
  				else if (strcmp(optarg, "use-set-session-authorization") == 0)
  					use_setsessauth = 1;
  				else
***************
*** 295,300 ****
--- 308,354 ----
  		opts->useDB = 1;
  	}
  
+ 	/*
+ 	 * Look for conflicting options relating to object groupings
+ 	 */
+ 	if (schemaOnly && dataOnly)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"-s/--schema-only", "-a/--data-only");
+ 		exit(1);
+ 	}
+ 	else if ((schemaOnly || dataOnly) && 
+ 				(schemaBeforeData || schemaAfterData))
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
+ 				schemaBeforeData ? "--schema-before-data" : "--schema-after-data");
+ 		exit(1);
+ 	}
+ 	else if (schemaBeforeData && schemaAfterData)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (schemaBeforeData)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (schemaAfterData)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
+ 
  	opts->disable_triggers = disable_triggers;
  	opts->noDataForFailedTables = no_data_for_failed_tables;
  	opts->noTablespace = outputNoTablespaces;
***************
*** 405,410 ****
--- 459,466 ----
  			 "                           do not restore data of tables that could not be\n"
  			 "                           created\n"));
  	printf(_("  --no-tablespaces         do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data     dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data      dump only the part of schema after table data\n"));
  	printf(_("  --use-set-session-authorization\n"
  			 "                           use SESSION AUTHORIZATION commands instead of\n"
  			 "                           OWNER TO commands\n"));
Index: src/interfaces/ecpg/preproc/keywords.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/interfaces/ecpg/preproc/keywords.c,v
retrieving revision 1.85
diff -c -r1.85 keywords.c
*** src/interfaces/ecpg/preproc/keywords.c	1 Jan 2008 19:45:59 -0000	1.85
--- src/interfaces/ecpg/preproc/keywords.c	18 Jul 2008 08:50:27 -0000
***************
*** 1,430 ****
  /*-------------------------------------------------------------------------
   *
   * keywords.c
!  *	  lexical token lookup for reserved words in PostgreSQL
   *
   * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
   *
   * IDENTIFICATION
!  *	  $PostgreSQL: pgsql/src/interfaces/ecpg/preproc/keywords.c,v 1.84 2007/11/15 21:14:45 momjian Exp $
   *
   *-------------------------------------------------------------------------
   */
- #include "postgres_fe.h"
- 
- #include <ctype.h>
  
! #include "extern.h"
! #include "preproc.h"
  
! /* compile both keyword lists in one file because they are always scanned together */
! #include "ecpg_keywords.c"
  
  /*
!  * List of (keyword-name, keyword-token-value) pairs.
!  *
!  * !!WARNING!!: This list must be sorted, because binary
!  *		 search is used to locate entries.
   */
! static const ScanKeyword ScanPGSQLKeywords[] = {
! 	/* name, value */
! 	{"abort", ABORT_P},
! 	{"absolute", ABSOLUTE_P},
! 	{"access", ACCESS},
! 	{"action", ACTION},
! 	{"add", ADD_P},
! 	{"admin", ADMIN},
! 	{"after", AFTER},
! 	{"aggregate", AGGREGATE},
! 	{"all", ALL},
! 	{"also", ALSO},
! 	{"alter", ALTER},
! 	{"always", ALWAYS},
! 	{"analyse", ANALYSE},		/* British spelling */
! 	{"analyze", ANALYZE},
! 	{"and", AND},
! 	{"any", ANY},
! 	{"array", ARRAY},
! 	{"as", AS},
! 	{"asc", ASC},
! 	{"assertion", ASSERTION},
! 	{"assignment", ASSIGNMENT},
! 	{"asymmetric", ASYMMETRIC},
! 	{"at", AT},
! 	{"authorization", AUTHORIZATION},
! 	{"backward", BACKWARD},
! 	{"before", BEFORE},
! 	{"begin", BEGIN_P},
! 	{"between", BETWEEN},
! 	{"bigint", BIGINT},
! 	{"binary", BINARY},
! 	{"bit", BIT},
! 	{"boolean", BOOLEAN_P},
! 	{"both", BOTH},
! 	{"by", BY},
! 	{"cache", CACHE},
! 	{"called", CALLED},
! 	{"cascade", CASCADE},
! 	{"cascaded", CASCADED},
! 	{"case", CASE},
! 	{"cast", CAST},
! 	{"chain", CHAIN},
! 	{"char", CHAR_P},
! 	{"character", CHARACTER},
! 	{"characteristics", CHARACTERISTICS},
! 	{"check", CHECK},
! 	{"checkpoint", CHECKPOINT},
! 	{"class", CLASS},
! 	{"close", CLOSE},
! 	{"cluster", CLUSTER},
! 	{"coalesce", COALESCE},
! 	{"collate", COLLATE},
! 	{"column", COLUMN},
! 	{"comment", COMMENT},
! 	{"commit", COMMIT},
! 	{"committed", COMMITTED},
! 	{"concurrently", CONCURRENTLY},
! 	{"configuration", CONFIGURATION},
! 	{"connection", CONNECTION},
! 	{"constraint", CONSTRAINT},
! 	{"constraints", CONSTRAINTS},
! 	{"content", CONTENT_P},
! 	{"conversion", CONVERSION_P},
! 	{"copy", COPY},
! 	{"cost", COST},
! 	{"create", CREATE},
! 	{"createdb", CREATEDB},
! 	{"createrole", CREATEROLE},
! 	{"createuser", CREATEUSER},
! 	{"cross", CROSS},
! 	{"csv", CSV},
! 	{"current", CURRENT_P},
! 	{"current_date", CURRENT_DATE},
! 	{"current_role", CURRENT_ROLE},
! 	{"current_time", CURRENT_TIME},
! 	{"current_timestamp", CURRENT_TIMESTAMP},
! 	{"cursor", CURSOR},
! 	{"cycle", CYCLE},
! 	{"database", DATABASE},
! 	{"day", DAY_P},
! 	{"deallocate", DEALLOCATE},
! 	{"dec", DEC},
! 	{"decimal", DECIMAL_P},
! 	{"declare", DECLARE},
! 	{"default", DEFAULT},
! 	{"defaults", DEFAULTS},
! 	{"deferrable", DEFERRABLE},
! 	{"deferred", DEFERRED},
! 	{"definer", DEFINER},
! 	{"delete", DELETE_P},
! 	{"delimiter", DELIMITER},
! 	{"delimiters", DELIMITERS},
! 	{"desc", DESC},
! 	{"dictionary", DICTIONARY},
! 	{"disable", DISABLE_P},
! 	{"discard", DISCARD},
! 	{"distinct", DISTINCT},
! 	{"do", DO},
! 	{"document", DOCUMENT_P},
! 	{"domain", DOMAIN_P},
! 	{"double", DOUBLE_P},
! 	{"drop", DROP},
! 	{"each", EACH},
! 	{"else", ELSE},
! 	{"enable", ENABLE_P},
! 	{"encoding", ENCODING},
! 	{"encrypted", ENCRYPTED},
! 	{"end", END_P},
! 	{"enum", ENUM_P},
! 	{"escape", ESCAPE},
! 	{"except", EXCEPT},
! 	{"excluding", EXCLUDING},
! 	{"exclusive", EXCLUSIVE},
! 	{"execute", EXECUTE},
! 	{"exists", EXISTS},
! 	{"explain", EXPLAIN},
! 	{"external", EXTERNAL},
! 	{"extract", EXTRACT},
! 	{"false", FALSE_P},
! 	{"family", FAMILY},
! 	{"fetch", FETCH},
! 	{"first", FIRST_P},
! 	{"float", FLOAT_P},
! 	{"for", FOR},
! 	{"force", FORCE},
! 	{"foreign", FOREIGN},
! 	{"forward", FORWARD},
! 	{"freeze", FREEZE},
! 	{"from", FROM},
! 	{"full", FULL},
! 	{"function", FUNCTION},
! 	{"get", GET},
! 	{"global", GLOBAL},
! 	{"grant", GRANT},
! 	{"granted", GRANTED},
! 	{"greatest", GREATEST},
! 	{"group", GROUP_P},
! 	{"handler", HANDLER},
! 	{"having", HAVING},
! 	{"header", HEADER_P},
! 	{"hold", HOLD},
! 	{"hour", HOUR_P},
! 	{"if", IF_P},
! 	{"ilike", ILIKE},
! 	{"immediate", IMMEDIATE},
! 	{"immutable", IMMUTABLE},
! 	{"implicit", IMPLICIT_P},
! 	{"in", IN_P},
! 	{"including", INCLUDING},
! 	{"increment", INCREMENT},
! 	{"index", INDEX},
! 	{"indexes", INDEXES},
! 	{"inherit", INHERIT},
! 	{"inherits", INHERITS},
! 	{"initially", INITIALLY},
! 	{"inner", INNER_P},
! 	{"inout", INOUT},
! 	{"input", INPUT_P},
! 	{"insensitive", INSENSITIVE},
! 	{"insert", INSERT},
! 	{"instead", INSTEAD},
! 	{"int", INT_P},
! 	{"integer", INTEGER},
! 	{"intersect", INTERSECT},
! 	{"interval", INTERVAL},
! 	{"into", INTO},
! 	{"invoker", INVOKER},
! 	{"is", IS},
! 	{"isnull", ISNULL},
! 	{"isolation", ISOLATION},
! 	{"join", JOIN},
! 	{"key", KEY},
! 	{"lancompiler", LANCOMPILER},
! 	{"language", LANGUAGE},
! 	{"large", LARGE_P},
! 	{"last", LAST_P},
! 	{"leading", LEADING},
! 	{"least", LEAST},
! 	{"left", LEFT},
! 	{"level", LEVEL},
! 	{"like", LIKE},
! 	{"limit", LIMIT},
! 	{"listen", LISTEN},
! 	{"load", LOAD},
! 	{"local", LOCAL},
! 	{"location", LOCATION},
! 	{"lock", LOCK_P},
! 	{"login", LOGIN_P},
! 	{"mapping", MAPPING},
! 	{"match", MATCH},
! 	{"maxvalue", MAXVALUE},
! 	{"minute", MINUTE_P},
! 	{"minvalue", MINVALUE},
! 	{"mode", MODE},
! 	{"month", MONTH_P},
! 	{"move", MOVE},
! 	{"name", NAME_P},
! 	{"names", NAMES},
! 	{"national", NATIONAL},
! 	{"natural", NATURAL},
! 	{"nchar", NCHAR},
! 	{"new", NEW},
! 	{"next", NEXT},
! 	{"no", NO},
! 	{"nocreatedb", NOCREATEDB},
! 	{"nocreaterole", NOCREATEROLE},
! 	{"nocreateuser", NOCREATEUSER},
! 	{"noinherit", NOINHERIT},
! 	{"nologin", NOLOGIN_P},
! 	{"none", NONE},
! 	{"nosuperuser", NOSUPERUSER},
! 	{"not", NOT},
! 	{"nothing", NOTHING},
! 	{"notify", NOTIFY},
! 	{"notnull", NOTNULL},
! 	{"nowait", NOWAIT},
! 	{"null", NULL_P},
! 	{"nullif", NULLIF},
! 	{"nulls", NULLS_P},
! 	{"numeric", NUMERIC},
! 	{"object", OBJECT_P},
! 	{"of", OF},
! 	{"off", OFF},
! 	{"offset", OFFSET},
! 	{"oids", OIDS},
! 	{"old", OLD},
! 	{"on", ON},
! 	{"only", ONLY},
! 	{"operator", OPERATOR},
! 	{"option", OPTION},
! 	{"or", OR},
! 	{"order", ORDER},
! 	{"out", OUT_P},
! 	{"outer", OUTER_P},
! 	{"overlaps", OVERLAPS},
! 	{"owned", OWNED},
! 	{"owner", OWNER},
! 	{"parser", PARSER},
! 	{"partial", PARTIAL},
! 	{"password", PASSWORD},
! 	{"placing", PLACING},
! 	{"plans", PLANS},
! 	{"position", POSITION},
! 	{"precision", PRECISION},
! 	{"prepare", PREPARE},
! 	{"prepared", PREPARED},
! 	{"preserve", PRESERVE},
! 	{"primary", PRIMARY},
! 	{"prior", PRIOR},
! 	{"privileges", PRIVILEGES},
! 	{"procedural", PROCEDURAL},
! 	{"procedure", PROCEDURE},
! 	{"quote", QUOTE},
! 	{"read", READ},
! 	{"real", REAL},
! 	{"reassign", REASSIGN},
! 	{"recheck", RECHECK},
! 	{"references", REFERENCES},
! 	{"reindex", REINDEX},
! 	{"relative", RELATIVE_P},
! 	{"release", RELEASE},
! 	{"rename", RENAME},
! 	{"repeatable", REPEATABLE},
! 	{"replace", REPLACE},
! 	{"replica", REPLICA},
! 	{"reset", RESET},
! 	{"restart", RESTART},
! 	{"restrict", RESTRICT},
! 	{"returning", RETURNING},
! 	{"returns", RETURNS},
! 	{"revoke", REVOKE},
! 	{"right", RIGHT},
! 	{"role", ROLE},
! 	{"rollback", ROLLBACK},
! 	{"row", ROW},
! 	{"rows", ROWS},
! 	{"rule", RULE},
! 	{"savepoint", SAVEPOINT},
! 	{"schema", SCHEMA},
! 	{"scroll", SCROLL},
! 	{"search", SEARCH},
! 	{"second", SECOND_P},
! 	{"security", SECURITY},
! 	{"select", SELECT},
! 	{"sequence", SEQUENCE},
! 	{"serializable", SERIALIZABLE},
! 	{"session", SESSION},
! 	{"session_user", SESSION_USER},
! 	{"set", SET},
! 	{"setof", SETOF},
! 	{"share", SHARE},
! 	{"show", SHOW},
! 	{"similar", SIMILAR},
! 	{"simple", SIMPLE},
! 	{"smallint", SMALLINT},
! 	{"some", SOME},
! 	{"stable", STABLE},
! 	{"standalone", STANDALONE_P},
! 	{"start", START},
! 	{"statement", STATEMENT},
! 	{"statistics", STATISTICS},
! 	{"stdin", STDIN},
! 	{"stdout", STDOUT},
! 	{"storage", STORAGE},
! 	{"strict", STRICT_P},
! 	{"strip", STRIP_P},
! 	{"substring", SUBSTRING},
! 	{"superuser", SUPERUSER_P},
! 	{"symmetric", SYMMETRIC},
! 	{"sysid", SYSID},
! 	{"system", SYSTEM_P},
! 	{"table", TABLE},
! 	{"tablespace", TABLESPACE},
! 	{"temp", TEMP},
! 	{"template", TEMPLATE},
! 	{"temporary", TEMPORARY},
! 	{"text", TEXT_P},
! 	{"then", THEN},
! 	{"time", TIME},
! 	{"timestamp", TIMESTAMP},
! 	{"to", TO},
! 	{"trailing", TRAILING},
! 	{"transaction", TRANSACTION},
! 	{"treat", TREAT},
! 	{"trigger", TRIGGER},
! 	{"trim", TRIM},
! 	{"true", TRUE_P},
! 	{"truncate", TRUNCATE},
! 	{"trusted", TRUSTED},
! 	{"type", TYPE_P},
! 	{"uncommitted", UNCOMMITTED},
! 	{"unencrypted", UNENCRYPTED},
! 	{"union", UNION},
! 	{"unique", UNIQUE},
! 	{"unknown", UNKNOWN},
! 	{"unlisten", UNLISTEN},
! 	{"until", UNTIL},
! 	{"update", UPDATE},
! 	{"user", USER},
! 	{"using", USING},
! 	{"vacuum", VACUUM},
! 	{"valid", VALID},
! 	{"validator", VALIDATOR},
! 	{"value", VALUE_P},
! 	{"values", VALUES},
! 	{"varchar", VARCHAR},
! 	{"varying", VARYING},
! 	{"verbose", VERBOSE},
! 	{"version", VERSION_P},
! 	{"view", VIEW},
! 	{"volatile", VOLATILE},
! 	{"when", WHEN},
! 	{"where", WHERE},
! 	{"whitespace", WHITESPACE_P},
! 	{"with", WITH},
! 	{"without", WITHOUT},
! 	{"work", WORK},
! 	{"write", WRITE},
! 	{"xml", XML_P},
! 	{"xmlattributes", XMLATTRIBUTES},
! 	{"xmlconcat", XMLCONCAT},
! 	{"xmlelement", XMLELEMENT},
! 	{"xmlforest", XMLFOREST},
! 	{"xmlparse", XMLPARSE},
! 	{"xmlpi", XMLPI},
! 	{"xmlroot", XMLROOT},
! 	{"xmlserialize", XMLSERIALIZE},
! 	{"year", YEAR_P},
! 	{"yes", YES_P},
! 	{"zone", ZONE},
! };
  
  
  /*
!  * Now do a binary search using plain strcmp() comparison.
   */
! const ScanKeyword *
! DoLookup(char *word, const ScanKeyword *low, const ScanKeyword *high)
! {
! 	while (low <= high)
! 	{
! 		const ScanKeyword *middle;
! 		int			difference;
  
! 		middle = low + (high - low) / 2;
! 		difference = strcmp(middle->name, word);
! 		if (difference == 0)
! 			return middle;
! 		else if (difference < 0)
! 			low = middle + 1;
! 		else
! 			high = middle - 1;
! 	}
  
! 	return NULL;
! }
  
  /*
   * ScanKeywordLookup - see if a given word is a keyword
--- 1,436 ----
  /*-------------------------------------------------------------------------
   *
   * keywords.c
!  *	  lexical token lookup for key words in PostgreSQL
!  *
!  * NB: This file is also used by pg_dump.
!  *
   *
   * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
   *
   * IDENTIFICATION
!  *	  $PostgreSQL: pgsql/src/backend/parser/keywords.c,v 1.198 2008/07/03 20:58:46 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
  
! /* Use c.h so that this file can be built in either frontend or backend */
! #include "c.h"
  
! #include <ctype.h>
  
  /*
!  * This macro definition overrides the YYSTYPE union definition in parse.h.
!  * We don't need that struct in this file, and including the real definition
!  * would require sucking in some backend-only include files.
   */
! #define YYSTYPE int
  
+ #include "parser/keywords.h"
+ #ifndef ECPG_COMPILE
+ #include "parser/parse.h"
+ #else
+ #include "preproc.h"
+ #endif
  
  /*
!  * List of keyword (name, token-value, category) entries.
!  *
!  * !!WARNING!!: This list must be sorted by ASCII name, because binary
!  *		 search is used to locate entries.
   */
! const ScanKeyword ScanKeywords[] = {
! 	/* name, value, category */
! 	{"abort", ABORT_P, UNRESERVED_KEYWORD},
! 	{"absolute", ABSOLUTE_P, UNRESERVED_KEYWORD},
! 	{"access", ACCESS, UNRESERVED_KEYWORD},
! 	{"action", ACTION, UNRESERVED_KEYWORD},
! 	{"add", ADD_P, UNRESERVED_KEYWORD},
! 	{"admin", ADMIN, UNRESERVED_KEYWORD},
! 	{"after", AFTER, UNRESERVED_KEYWORD},
! 	{"aggregate", AGGREGATE, UNRESERVED_KEYWORD},
! 	{"all", ALL, RESERVED_KEYWORD},
! 	{"also", ALSO, UNRESERVED_KEYWORD},
! 	{"alter", ALTER, UNRESERVED_KEYWORD},
! 	{"always", ALWAYS, UNRESERVED_KEYWORD},
! 	{"analyse", ANALYSE, RESERVED_KEYWORD},		/* British spelling */
! 	{"analyze", ANALYZE, RESERVED_KEYWORD},
! 	{"and", AND, RESERVED_KEYWORD},
! 	{"any", ANY, RESERVED_KEYWORD},
! 	{"array", ARRAY, RESERVED_KEYWORD},
! 	{"as", AS, RESERVED_KEYWORD},
! 	{"asc", ASC, RESERVED_KEYWORD},
! 	{"assertion", ASSERTION, UNRESERVED_KEYWORD},
! 	{"assignment", ASSIGNMENT, UNRESERVED_KEYWORD},
! 	{"asymmetric", ASYMMETRIC, RESERVED_KEYWORD},
! 	{"at", AT, UNRESERVED_KEYWORD},
! 	{"authorization", AUTHORIZATION, TYPE_FUNC_NAME_KEYWORD},
! 	{"backward", BACKWARD, UNRESERVED_KEYWORD},
! 	{"before", BEFORE, UNRESERVED_KEYWORD},
! 	{"begin", BEGIN_P, UNRESERVED_KEYWORD},
! 	{"between", BETWEEN, TYPE_FUNC_NAME_KEYWORD},
! 	{"bigint", BIGINT, COL_NAME_KEYWORD},
! 	{"binary", BINARY, TYPE_FUNC_NAME_KEYWORD},
! 	{"bit", BIT, COL_NAME_KEYWORD},
! 	{"boolean", BOOLEAN_P, COL_NAME_KEYWORD},
! 	{"both", BOTH, RESERVED_KEYWORD},
! 	{"by", BY, UNRESERVED_KEYWORD},
! 	{"cache", CACHE, UNRESERVED_KEYWORD},
! 	{"called", CALLED, UNRESERVED_KEYWORD},
! 	{"cascade", CASCADE, UNRESERVED_KEYWORD},
! 	{"cascaded", CASCADED, UNRESERVED_KEYWORD},
! 	{"case", CASE, RESERVED_KEYWORD},
! 	{"cast", CAST, RESERVED_KEYWORD},
! 	{"chain", CHAIN, UNRESERVED_KEYWORD},
! 	{"char", CHAR_P, COL_NAME_KEYWORD},
! 	{"character", CHARACTER, COL_NAME_KEYWORD},
! 	{"characteristics", CHARACTERISTICS, UNRESERVED_KEYWORD},
! 	{"check", CHECK, RESERVED_KEYWORD},
! 	{"checkpoint", CHECKPOINT, UNRESERVED_KEYWORD},
! 	{"class", CLASS, UNRESERVED_KEYWORD},
! 	{"close", CLOSE, UNRESERVED_KEYWORD},
! 	{"cluster", CLUSTER, UNRESERVED_KEYWORD},
! 	{"coalesce", COALESCE, COL_NAME_KEYWORD},
! 	{"collate", COLLATE, RESERVED_KEYWORD},
! 	{"column", COLUMN, RESERVED_KEYWORD},
! 	{"comment", COMMENT, UNRESERVED_KEYWORD},
! 	{"commit", COMMIT, UNRESERVED_KEYWORD},
! 	{"committed", COMMITTED, UNRESERVED_KEYWORD},
! 	{"concurrently", CONCURRENTLY, UNRESERVED_KEYWORD},
! 	{"configuration", CONFIGURATION, UNRESERVED_KEYWORD},
! 	{"connection", CONNECTION, UNRESERVED_KEYWORD},
! 	{"constraint", CONSTRAINT, RESERVED_KEYWORD},
! 	{"constraints", CONSTRAINTS, UNRESERVED_KEYWORD},
! 	{"content", CONTENT_P, UNRESERVED_KEYWORD},
! 	{"continue", CONTINUE_P, UNRESERVED_KEYWORD},
! 	{"conversion", CONVERSION_P, UNRESERVED_KEYWORD},
! 	{"copy", COPY, UNRESERVED_KEYWORD},
! 	{"cost", COST, UNRESERVED_KEYWORD},
! 	{"create", CREATE, RESERVED_KEYWORD},
! 	{"createdb", CREATEDB, UNRESERVED_KEYWORD},
! 	{"createrole", CREATEROLE, UNRESERVED_KEYWORD},
! 	{"createuser", CREATEUSER, UNRESERVED_KEYWORD},
! 	{"cross", CROSS, TYPE_FUNC_NAME_KEYWORD},
! 	{"csv", CSV, UNRESERVED_KEYWORD},
! 	{"current", CURRENT_P, UNRESERVED_KEYWORD},
! 	{"current_date", CURRENT_DATE, RESERVED_KEYWORD},
! 	{"current_role", CURRENT_ROLE, RESERVED_KEYWORD},
! 	{"current_time", CURRENT_TIME, RESERVED_KEYWORD},
! 	{"current_timestamp", CURRENT_TIMESTAMP, RESERVED_KEYWORD},
! 	{"current_user", CURRENT_USER, RESERVED_KEYWORD},
! 	{"cursor", CURSOR, UNRESERVED_KEYWORD},
! 	{"cycle", CYCLE, UNRESERVED_KEYWORD},
! 	{"database", DATABASE, UNRESERVED_KEYWORD},
! 	{"day", DAY_P, UNRESERVED_KEYWORD},
! 	{"deallocate", DEALLOCATE, UNRESERVED_KEYWORD},
! 	{"dec", DEC, COL_NAME_KEYWORD},
! 	{"decimal", DECIMAL_P, COL_NAME_KEYWORD},
! 	{"declare", DECLARE, UNRESERVED_KEYWORD},
! 	{"default", DEFAULT, RESERVED_KEYWORD},
! 	{"defaults", DEFAULTS, UNRESERVED_KEYWORD},
! 	{"deferrable", DEFERRABLE, RESERVED_KEYWORD},
! 	{"deferred", DEFERRED, UNRESERVED_KEYWORD},
! 	{"definer", DEFINER, UNRESERVED_KEYWORD},
! 	{"delete", DELETE_P, UNRESERVED_KEYWORD},
! 	{"delimiter", DELIMITER, UNRESERVED_KEYWORD},
! 	{"delimiters", DELIMITERS, UNRESERVED_KEYWORD},
! 	{"desc", DESC, RESERVED_KEYWORD},
! 	{"dictionary", DICTIONARY, UNRESERVED_KEYWORD},
! 	{"disable", DISABLE_P, UNRESERVED_KEYWORD},
! 	{"discard", DISCARD, UNRESERVED_KEYWORD},
! 	{"distinct", DISTINCT, RESERVED_KEYWORD},
! 	{"do", DO, RESERVED_KEYWORD},
! 	{"document", DOCUMENT_P, UNRESERVED_KEYWORD},
! 	{"domain", DOMAIN_P, UNRESERVED_KEYWORD},
! 	{"double", DOUBLE_P, UNRESERVED_KEYWORD},
! 	{"drop", DROP, UNRESERVED_KEYWORD},
! 	{"each", EACH, UNRESERVED_KEYWORD},
! 	{"else", ELSE, RESERVED_KEYWORD},
! 	{"enable", ENABLE_P, UNRESERVED_KEYWORD},
! 	{"encoding", ENCODING, UNRESERVED_KEYWORD},
! 	{"encrypted", ENCRYPTED, UNRESERVED_KEYWORD},
! 	{"end", END_P, RESERVED_KEYWORD},
! 	{"enum", ENUM_P, UNRESERVED_KEYWORD},
! 	{"escape", ESCAPE, UNRESERVED_KEYWORD},
! 	{"except", EXCEPT, RESERVED_KEYWORD},
! 	{"excluding", EXCLUDING, UNRESERVED_KEYWORD},
! 	{"exclusive", EXCLUSIVE, UNRESERVED_KEYWORD},
! 	{"execute", EXECUTE, UNRESERVED_KEYWORD},
! 	{"exists", EXISTS, COL_NAME_KEYWORD},
! 	{"explain", EXPLAIN, UNRESERVED_KEYWORD},
! 	{"external", EXTERNAL, UNRESERVED_KEYWORD},
! 	{"extract", EXTRACT, COL_NAME_KEYWORD},
! 	{"false", FALSE_P, RESERVED_KEYWORD},
! 	{"family", FAMILY, UNRESERVED_KEYWORD},
! 	{"fetch", FETCH, UNRESERVED_KEYWORD},
! 	{"first", FIRST_P, UNRESERVED_KEYWORD},
! 	{"float", FLOAT_P, COL_NAME_KEYWORD},
! 	{"for", FOR, RESERVED_KEYWORD},
! 	{"force", FORCE, UNRESERVED_KEYWORD},
! 	{"foreign", FOREIGN, RESERVED_KEYWORD},
! 	{"forward", FORWARD, UNRESERVED_KEYWORD},
! 	{"freeze", FREEZE, TYPE_FUNC_NAME_KEYWORD},
! 	{"from", FROM, RESERVED_KEYWORD},
! 	{"full", FULL, TYPE_FUNC_NAME_KEYWORD},
! 	{"function", FUNCTION, UNRESERVED_KEYWORD},
! 	{"global", GLOBAL, UNRESERVED_KEYWORD},
! 	{"grant", GRANT, RESERVED_KEYWORD},
! 	{"granted", GRANTED, UNRESERVED_KEYWORD},
! 	{"greatest", GREATEST, COL_NAME_KEYWORD},
! 	{"group", GROUP_P, RESERVED_KEYWORD},
! 	{"handler", HANDLER, UNRESERVED_KEYWORD},
! 	{"having", HAVING, RESERVED_KEYWORD},
! 	{"header", HEADER_P, UNRESERVED_KEYWORD},
! 	{"hold", HOLD, UNRESERVED_KEYWORD},
! 	{"hour", HOUR_P, UNRESERVED_KEYWORD},
! 	{"identity", IDENTITY_P, UNRESERVED_KEYWORD},
! 	{"if", IF_P, UNRESERVED_KEYWORD},
! 	{"ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD},
! 	{"immediate", IMMEDIATE, UNRESERVED_KEYWORD},
! 	{"immutable", IMMUTABLE, UNRESERVED_KEYWORD},
! 	{"implicit", IMPLICIT_P, UNRESERVED_KEYWORD},
! 	{"in", IN_P, RESERVED_KEYWORD},
! 	{"including", INCLUDING, UNRESERVED_KEYWORD},
! 	{"increment", INCREMENT, UNRESERVED_KEYWORD},
! 	{"index", INDEX, UNRESERVED_KEYWORD},
! 	{"indexes", INDEXES, UNRESERVED_KEYWORD},
! 	{"inherit", INHERIT, UNRESERVED_KEYWORD},
! 	{"inherits", INHERITS, UNRESERVED_KEYWORD},
! 	{"initially", INITIALLY, RESERVED_KEYWORD},
! 	{"inner", INNER_P, TYPE_FUNC_NAME_KEYWORD},
! 	{"inout", INOUT, COL_NAME_KEYWORD},
! 	{"input", INPUT_P, UNRESERVED_KEYWORD},
! 	{"insensitive", INSENSITIVE, UNRESERVED_KEYWORD},
! 	{"insert", INSERT, UNRESERVED_KEYWORD},
! 	{"instead", INSTEAD, UNRESERVED_KEYWORD},
! 	{"int", INT_P, COL_NAME_KEYWORD},
! 	{"integer", INTEGER, COL_NAME_KEYWORD},
! 	{"intersect", INTERSECT, RESERVED_KEYWORD},
! 	{"interval", INTERVAL, COL_NAME_KEYWORD},
! 	{"into", INTO, RESERVED_KEYWORD},
! 	{"invoker", INVOKER, UNRESERVED_KEYWORD},
! 	{"is", IS, TYPE_FUNC_NAME_KEYWORD},
! 	{"isnull", ISNULL, TYPE_FUNC_NAME_KEYWORD},
! 	{"isolation", ISOLATION, UNRESERVED_KEYWORD},
! 	{"join", JOIN, TYPE_FUNC_NAME_KEYWORD},
! 	{"key", KEY, UNRESERVED_KEYWORD},
! 	{"lancompiler", LANCOMPILER, UNRESERVED_KEYWORD},
! 	{"language", LANGUAGE, UNRESERVED_KEYWORD},
! 	{"large", LARGE_P, UNRESERVED_KEYWORD},
! 	{"last", LAST_P, UNRESERVED_KEYWORD},
! 	{"leading", LEADING, RESERVED_KEYWORD},
! 	{"least", LEAST, COL_NAME_KEYWORD},
! 	{"left", LEFT, TYPE_FUNC_NAME_KEYWORD},
! 	{"level", LEVEL, UNRESERVED_KEYWORD},
! 	{"like", LIKE, TYPE_FUNC_NAME_KEYWORD},
! 	{"limit", LIMIT, RESERVED_KEYWORD},
! 	{"listen", LISTEN, UNRESERVED_KEYWORD},
! 	{"load", LOAD, UNRESERVED_KEYWORD},
! 	{"local", LOCAL, UNRESERVED_KEYWORD},
! 	{"localtime", LOCALTIME, RESERVED_KEYWORD},
! 	{"localtimestamp", LOCALTIMESTAMP, RESERVED_KEYWORD},
! 	{"location", LOCATION, UNRESERVED_KEYWORD},
! 	{"lock", LOCK_P, UNRESERVED_KEYWORD},
! 	{"login", LOGIN_P, UNRESERVED_KEYWORD},
! 	{"mapping", MAPPING, UNRESERVED_KEYWORD},
! 	{"match", MATCH, UNRESERVED_KEYWORD},
! 	{"maxvalue", MAXVALUE, UNRESERVED_KEYWORD},
! 	{"minute", MINUTE_P, UNRESERVED_KEYWORD},
! 	{"minvalue", MINVALUE, UNRESERVED_KEYWORD},
! 	{"mode", MODE, UNRESERVED_KEYWORD},
! 	{"month", MONTH_P, UNRESERVED_KEYWORD},
! 	{"move", MOVE, UNRESERVED_KEYWORD},
! 	{"name", NAME_P, UNRESERVED_KEYWORD},
! 	{"names", NAMES, UNRESERVED_KEYWORD},
! 	{"national", NATIONAL, COL_NAME_KEYWORD},
! 	{"natural", NATURAL, TYPE_FUNC_NAME_KEYWORD},
! 	{"nchar", NCHAR, COL_NAME_KEYWORD},
! 	{"new", NEW, RESERVED_KEYWORD},
! 	{"next", NEXT, UNRESERVED_KEYWORD},
! 	{"no", NO, UNRESERVED_KEYWORD},
! 	{"nocreatedb", NOCREATEDB, UNRESERVED_KEYWORD},
! 	{"nocreaterole", NOCREATEROLE, UNRESERVED_KEYWORD},
! 	{"nocreateuser", NOCREATEUSER, UNRESERVED_KEYWORD},
! 	{"noinherit", NOINHERIT, UNRESERVED_KEYWORD},
! 	{"nologin", NOLOGIN_P, UNRESERVED_KEYWORD},
! 	{"none", NONE, COL_NAME_KEYWORD},
! 	{"nosuperuser", NOSUPERUSER, UNRESERVED_KEYWORD},
! 	{"not", NOT, RESERVED_KEYWORD},
! 	{"nothing", NOTHING, UNRESERVED_KEYWORD},
! 	{"notify", NOTIFY, UNRESERVED_KEYWORD},
! 	{"notnull", NOTNULL, TYPE_FUNC_NAME_KEYWORD},
! 	{"nowait", NOWAIT, UNRESERVED_KEYWORD},
! 	{"null", NULL_P, RESERVED_KEYWORD},
! 	{"nullif", NULLIF, COL_NAME_KEYWORD},
! 	{"nulls", NULLS_P, UNRESERVED_KEYWORD},
! 	{"numeric", NUMERIC, COL_NAME_KEYWORD},
! 	{"object", OBJECT_P, UNRESERVED_KEYWORD},
! 	{"of", OF, UNRESERVED_KEYWORD},
! 	{"off", OFF, RESERVED_KEYWORD},
! 	{"offset", OFFSET, RESERVED_KEYWORD},
! 	{"oids", OIDS, UNRESERVED_KEYWORD},
! 	{"old", OLD, RESERVED_KEYWORD},
! 	{"on", ON, RESERVED_KEYWORD},
! 	{"only", ONLY, RESERVED_KEYWORD},
! 	{"operator", OPERATOR, UNRESERVED_KEYWORD},
! 	{"option", OPTION, UNRESERVED_KEYWORD},
! 	{"or", OR, RESERVED_KEYWORD},
! 	{"order", ORDER, RESERVED_KEYWORD},
! 	{"out", OUT_P, COL_NAME_KEYWORD},
! 	{"outer", OUTER_P, TYPE_FUNC_NAME_KEYWORD},
! 	{"overlaps", OVERLAPS, TYPE_FUNC_NAME_KEYWORD},
! 	{"overlay", OVERLAY, COL_NAME_KEYWORD},
! 	{"owned", OWNED, UNRESERVED_KEYWORD},
! 	{"owner", OWNER, UNRESERVED_KEYWORD},
! 	{"parser", PARSER, UNRESERVED_KEYWORD},
! 	{"partial", PARTIAL, UNRESERVED_KEYWORD},
! 	{"password", PASSWORD, UNRESERVED_KEYWORD},
! 	{"placing", PLACING, RESERVED_KEYWORD},
! 	{"plans", PLANS, UNRESERVED_KEYWORD},
! 	{"position", POSITION, COL_NAME_KEYWORD},
! 	{"precision", PRECISION, COL_NAME_KEYWORD},
! 	{"prepare", PREPARE, UNRESERVED_KEYWORD},
! 	{"prepared", PREPARED, UNRESERVED_KEYWORD},
! 	{"preserve", PRESERVE, UNRESERVED_KEYWORD},
! 	{"primary", PRIMARY, RESERVED_KEYWORD},
! 	{"prior", PRIOR, UNRESERVED_KEYWORD},
! 	{"privileges", PRIVILEGES, UNRESERVED_KEYWORD},
! 	{"procedural", PROCEDURAL, UNRESERVED_KEYWORD},
! 	{"procedure", PROCEDURE, UNRESERVED_KEYWORD},
! 	{"quote", QUOTE, UNRESERVED_KEYWORD},
! 	{"read", READ, UNRESERVED_KEYWORD},
! 	{"real", REAL, COL_NAME_KEYWORD},
! 	{"reassign", REASSIGN, UNRESERVED_KEYWORD},
! 	{"recheck", RECHECK, UNRESERVED_KEYWORD},
! 	{"references", REFERENCES, RESERVED_KEYWORD},
! 	{"reindex", REINDEX, UNRESERVED_KEYWORD},
! 	{"relative", RELATIVE_P, UNRESERVED_KEYWORD},
! 	{"release", RELEASE, UNRESERVED_KEYWORD},
! 	{"rename", RENAME, UNRESERVED_KEYWORD},
! 	{"repeatable", REPEATABLE, UNRESERVED_KEYWORD},
! 	{"replace", REPLACE, UNRESERVED_KEYWORD},
! 	{"replica", REPLICA, UNRESERVED_KEYWORD},
! 	{"reset", RESET, UNRESERVED_KEYWORD},
! 	{"restart", RESTART, UNRESERVED_KEYWORD},
! 	{"restrict", RESTRICT, UNRESERVED_KEYWORD},
! 	{"returning", RETURNING, RESERVED_KEYWORD},
! 	{"returns", RETURNS, UNRESERVED_KEYWORD},
! 	{"revoke", REVOKE, UNRESERVED_KEYWORD},
! 	{"right", RIGHT, TYPE_FUNC_NAME_KEYWORD},
! 	{"role", ROLE, UNRESERVED_KEYWORD},
! 	{"rollback", ROLLBACK, UNRESERVED_KEYWORD},
! 	{"row", ROW, COL_NAME_KEYWORD},
! 	{"rows", ROWS, UNRESERVED_KEYWORD},
! 	{"rule", RULE, UNRESERVED_KEYWORD},
! 	{"savepoint", SAVEPOINT, UNRESERVED_KEYWORD},
! 	{"schema", SCHEMA, UNRESERVED_KEYWORD},
! 	{"scroll", SCROLL, UNRESERVED_KEYWORD},
! 	{"search", SEARCH, UNRESERVED_KEYWORD},
! 	{"second", SECOND_P, UNRESERVED_KEYWORD},
! 	{"security", SECURITY, UNRESERVED_KEYWORD},
! 	{"select", SELECT, RESERVED_KEYWORD},
! 	{"sequence", SEQUENCE, UNRESERVED_KEYWORD},
! 	{"serializable", SERIALIZABLE, UNRESERVED_KEYWORD},
! 	{"session", SESSION, UNRESERVED_KEYWORD},
! 	{"session_user", SESSION_USER, RESERVED_KEYWORD},
! 	{"set", SET, UNRESERVED_KEYWORD},
! 	{"setof", SETOF, COL_NAME_KEYWORD},
! 	{"share", SHARE, UNRESERVED_KEYWORD},
! 	{"show", SHOW, UNRESERVED_KEYWORD},
! 	{"similar", SIMILAR, TYPE_FUNC_NAME_KEYWORD},
! 	{"simple", SIMPLE, UNRESERVED_KEYWORD},
! 	{"smallint", SMALLINT, COL_NAME_KEYWORD},
! 	{"some", SOME, RESERVED_KEYWORD},
! 	{"stable", STABLE, UNRESERVED_KEYWORD},
! 	{"standalone", STANDALONE_P, UNRESERVED_KEYWORD},
! 	{"start", START, UNRESERVED_KEYWORD},
! 	{"statement", STATEMENT, UNRESERVED_KEYWORD},
! 	{"statistics", STATISTICS, UNRESERVED_KEYWORD},
! 	{"stdin", STDIN, UNRESERVED_KEYWORD},
! 	{"stdout", STDOUT, UNRESERVED_KEYWORD},
! 	{"storage", STORAGE, UNRESERVED_KEYWORD},
! 	{"strict", STRICT_P, UNRESERVED_KEYWORD},
! 	{"strip", STRIP_P, UNRESERVED_KEYWORD},
! 	{"substring", SUBSTRING, COL_NAME_KEYWORD},
! 	{"superuser", SUPERUSER_P, UNRESERVED_KEYWORD},
! 	{"symmetric", SYMMETRIC, RESERVED_KEYWORD},
! 	{"sysid", SYSID, UNRESERVED_KEYWORD},
! 	{"system", SYSTEM_P, UNRESERVED_KEYWORD},
! 	{"table", TABLE, RESERVED_KEYWORD},
! 	{"tablespace", TABLESPACE, UNRESERVED_KEYWORD},
! 	{"temp", TEMP, UNRESERVED_KEYWORD},
! 	{"template", TEMPLATE, UNRESERVED_KEYWORD},
! 	{"temporary", TEMPORARY, UNRESERVED_KEYWORD},
! 	{"text", TEXT_P, UNRESERVED_KEYWORD},
! 	{"then", THEN, RESERVED_KEYWORD},
! 	{"time", TIME, COL_NAME_KEYWORD},
! 	{"timestamp", TIMESTAMP, COL_NAME_KEYWORD},
! 	{"to", TO, RESERVED_KEYWORD},
! 	{"trailing", TRAILING, RESERVED_KEYWORD},
! 	{"transaction", TRANSACTION, UNRESERVED_KEYWORD},
! 	{"treat", TREAT, COL_NAME_KEYWORD},
! 	{"trigger", TRIGGER, UNRESERVED_KEYWORD},
! 	{"trim", TRIM, COL_NAME_KEYWORD},
! 	{"true", TRUE_P, RESERVED_KEYWORD},
! 	{"truncate", TRUNCATE, UNRESERVED_KEYWORD},
! 	{"trusted", TRUSTED, UNRESERVED_KEYWORD},
! 	{"type", TYPE_P, UNRESERVED_KEYWORD},
! 	{"uncommitted", UNCOMMITTED, UNRESERVED_KEYWORD},
! 	{"unencrypted", UNENCRYPTED, UNRESERVED_KEYWORD},
! 	{"union", UNION, RESERVED_KEYWORD},
! 	{"unique", UNIQUE, RESERVED_KEYWORD},
! 	{"unknown", UNKNOWN, UNRESERVED_KEYWORD},
! 	{"unlisten", UNLISTEN, UNRESERVED_KEYWORD},
! 	{"until", UNTIL, UNRESERVED_KEYWORD},
! 	{"update", UPDATE, UNRESERVED_KEYWORD},
! 	{"user", USER, RESERVED_KEYWORD},
! 	{"using", USING, RESERVED_KEYWORD},
! 	{"vacuum", VACUUM, UNRESERVED_KEYWORD},
! 	{"valid", VALID, UNRESERVED_KEYWORD},
! 	{"validator", VALIDATOR, UNRESERVED_KEYWORD},
! 	{"value", VALUE_P, UNRESERVED_KEYWORD},
! 	{"values", VALUES, COL_NAME_KEYWORD},
! 	{"varchar", VARCHAR, COL_NAME_KEYWORD},
! 	{"variadic", VARIADIC, RESERVED_KEYWORD},
! 	{"varying", VARYING, UNRESERVED_KEYWORD},
! 	{"verbose", VERBOSE, TYPE_FUNC_NAME_KEYWORD},
! 	{"version", VERSION_P, UNRESERVED_KEYWORD},
! 	{"view", VIEW, UNRESERVED_KEYWORD},
! 	{"volatile", VOLATILE, UNRESERVED_KEYWORD},
! 	{"when", WHEN, RESERVED_KEYWORD},
! 	{"where", WHERE, RESERVED_KEYWORD},
! 	{"whitespace", WHITESPACE_P, UNRESERVED_KEYWORD},
  
! 	/*
! 	 * XXX we mark WITH as reserved to force it to be quoted in dumps, even
! 	 * though it is currently unreserved according to gram.y.  This is because
! 	 * we expect we'll have to make it reserved to implement SQL WITH clauses.
! 	 * If that patch manages to do without reserving WITH, adjust this entry
! 	 * at that time; in any case this should be back in sync with gram.y after
! 	 * WITH clauses are implemented.
! 	 */
! 	{"with", WITH, RESERVED_KEYWORD},
! 	{"without", WITHOUT, UNRESERVED_KEYWORD},
! 	{"work", WORK, UNRESERVED_KEYWORD},
! 	{"write", WRITE, UNRESERVED_KEYWORD},
! 	{"xml", XML_P, UNRESERVED_KEYWORD},
! 	{"xmlattributes", XMLATTRIBUTES, COL_NAME_KEYWORD},
! 	{"xmlconcat", XMLCONCAT, COL_NAME_KEYWORD},
! 	{"xmlelement", XMLELEMENT, COL_NAME_KEYWORD},
! 	{"xmlforest", XMLFOREST, COL_NAME_KEYWORD},
! 	{"xmlparse", XMLPARSE, COL_NAME_KEYWORD},
! 	{"xmlpi", XMLPI, COL_NAME_KEYWORD},
! 	{"xmlroot", XMLROOT, COL_NAME_KEYWORD},
! 	{"xmlserialize", XMLSERIALIZE, COL_NAME_KEYWORD},
! 	{"year", YEAR_P, UNRESERVED_KEYWORD},
! 	{"yes", YES_P, UNRESERVED_KEYWORD},
! 	{"zone", ZONE, UNRESERVED_KEYWORD},
! };
  
! /* End of ScanKeywords, for use elsewhere */
! const ScanKeyword *LastScanKeyword = endof(ScanKeywords);
  
  /*
   * ScanKeywordLookup - see if a given word is a keyword
***************
*** 439,450 ****
   * receive a different case-normalization mapping.
   */
  const ScanKeyword *
! ScanKeywordLookup(char *text)
  {
  	int			len,
  				i;
  	char		word[NAMEDATALEN];
! 	const ScanKeyword *res;
  
  	len = strlen(text);
  	/* We assume all keywords are shorter than NAMEDATALEN. */
--- 445,457 ----
   * receive a different case-normalization mapping.
   */
  const ScanKeyword *
! ScanKeywordLookup(const char *text)
  {
  	int			len,
  				i;
  	char		word[NAMEDATALEN];
! 	const ScanKeyword *low;
! 	const ScanKeyword *high;
  
  	len = strlen(text);
  	/* We assume all keywords are shorter than NAMEDATALEN. */
***************
*** 468,476 ****
  	/*
  	 * Now do a binary search using plain strcmp() comparison.
  	 */
! 	res = DoLookup(word, &ScanPGSQLKeywords[0], endof(ScanPGSQLKeywords) - 1);
! 	if (res)
! 		return res;
  
! 	return DoLookup(word, &ScanECPGKeywords[0], endof(ScanECPGKeywords) - 1);
  }
--- 475,496 ----
  	/*
  	 * Now do a binary search using plain strcmp() comparison.
  	 */
! 	low = &ScanKeywords[0];
! 	high = endof(ScanKeywords) - 1;
! 	while (low <= high)
! 	{
! 		const ScanKeyword *middle;
! 		int			difference;
  
! 		middle = low + (high - low) / 2;
! 		difference = strcmp(middle->name, word);
! 		if (difference == 0)
! 			return middle;
! 		else if (difference < 0)
! 			low = middle + 1;
! 		else
! 			high = middle - 1;
! 	}
! 
! 	return NULL;
  }

#27

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#26)

Re: pg_dump additional options for performance

Simon Riggs <simon@2ndquadrant.com> writes:

[80k patch]

Surely there is a whole lot of unintended noise in this patch?
I certainly don't believe that you meant to change keywords.c
for instance.

regards, tom lane

#28

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#27)

1 attachment(s)

Re: pg_dump additional options for performance

On Thu, 2008-07-24 at 03:54 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

[80k patch]

Surely there is a whole lot of unintended noise in this patch?
I certainly don't believe that you meant to change keywords.c
for instance.

Removed, thanks.

Unrelated to this patch, it seems I have some issues with my repository,
judging by this and another unrelated issue reported by Martin Zaun.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

pg_dump_beforeafter.v6.patchtext/x-patch; charset=utf-8; name=pg_dump_beforeafter.v6.patchDownload

Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.103
diff -c -r1.103 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml	20 Jul 2008 18:43:30 -0000	1.103
--- doc/src/sgml/ref/pg_dump.sgml	24 Jul 2008 07:30:19 -0000
***************
*** 133,139 ****
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
--- 133,140 ----
         <para>
          Include large objects in the dump.  This is the default behavior
          except when <option>--schema</>, <option>--table</>, or
!         <option>--schema-only</> or <option>--schema-before-data</> or
!         <option>--schema-after-data</> is specified, so the <option>-b</>
          switch is only useful to add large objects to selective dumps.
         </para>
        </listitem>
***************
*** 426,431 ****
--- 427,452 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur before table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Dump object definitions (schema) that occur after table data,
+ 		using the order produced by a full dump.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 790,795 ****
--- 811,844 ----
    </para>
  
    <para>
+    The output of <application>pg_dump</application> can be divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     Because <application>pg_dump</application> is used to transfer data
     to newer versions of <productname>PostgreSQL</>, the output of
     <application>pg_dump</application> can be loaded into
Index: doc/src/sgml/ref/pg_restore.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_restore.sgml,v
retrieving revision 1.75
diff -c -r1.75 pg_restore.sgml
*** doc/src/sgml/ref/pg_restore.sgml	13 Apr 2008 03:49:21 -0000	1.75
--- doc/src/sgml/ref/pg_restore.sgml	24 Jul 2008 07:30:19 -0000
***************
*** 321,326 ****
--- 321,346 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--schema-before-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur before table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
+       <term><option>--schema-after-data</option></term>
+       <listitem>
+        <para>
+ 		Restore object definitions (schema) that occur after table data,
+ 		using the order produced by a full restore.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-S <replaceable class="parameter">username</replaceable></option></term>
        <term><option>--superuser=<replaceable class="parameter">username</replaceable></option></term>
        <listitem>
***************
*** 572,577 ****
--- 592,626 ----
    </para>
  
    <para>
+    The actions of <application>pg_restore</application> can be 
+    divided into three parts:
+    <itemizedlist>
+     <listitem>
+      <para>
+ 	  Before Data - objects output before data, which includes
+ 	  <command>CREATE TABLE</command> statements and others.
+ 	  This part can be requested using <option>--schema-before-data</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  Table Data - data can be requested using <option>--data-only</>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+ 	  After Data - objects output after data, which includes
+ 	  <command>CREATE INDEX</command> statements and others.
+ 	  This part can be requested using <option>--schema-after-data</>.
+      </para>
+     </listitem>
+    </itemizedlist>
+    This allows us to work more easily with large data dump files when
+    there is some need to edit commands or resequence their execution for
+    performance.
+   </para>
+ 
+   <para>
     The limitations of <application>pg_restore</application> are detailed below.
  
     <itemizedlist>
Index: src/bin/pg_dump/pg_backup.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** src/bin/pg_dump/pg_backup.h	13 Apr 2008 03:49:21 -0000	1.47
--- src/bin/pg_dump/pg_backup.h	24 Jul 2008 07:30:19 -0000
***************
*** 89,95 ****
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dataOnly;
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
--- 89,95 ----
  	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
  								 * instead of OWNER TO */
  	char	   *superuser;		/* Username to use as superuser */
! 	int			dumpObjFlags;	/* which objects types to dump */
  	int			dropSchema;
  	char	   *filename;
  	int			schemaOnly;
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.157
diff -c -r1.157 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c	4 May 2008 08:32:21 -0000	1.157
--- src/bin/pg_dump/pg_backup_archiver.c	24 Jul 2008 07:30:19 -0000
***************
*** 56,62 ****
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
--- 56,62 ----
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static int _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt);
  static TocEntry *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
***************
*** 129,135 ****
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	teReqs		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
--- 129,135 ----
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
! 	int		reqs;
  	OutputContext sav;
  	bool		defnDumped;
  
***************
*** 175,193 ****
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then we set the dataOnly flag.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same as
! 	 * dataOnly. At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!ropt->dataOnly)
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if ((reqs & REQ_SCHEMA) != 0)
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
--- 175,193 ----
  	 * Work out if we have an implied data-only restore. This can happen if
  	 * the dump was data only or if the user has used a toc list to exclude
  	 * all of the schema data. All we do is look for schema entries - if none
! 	 * are found then say we only want DATA type objects.
  	 *
! 	 * We could scan for wanted TABLE entries, but that is not the same.
! 	 * At this stage, it seems unnecessary (6-Mar-2001).
  	 */
! 	if (!WANT_DATA(ropt->dumpObjFlags))
  	{
  		int			impliedDataOnly = 1;
  
  		for (te = AH->toc->next; te != AH->toc; te = te->next)
  		{
  			reqs = _tocEntryRequired(te, ropt, true);
! 			if (WANT_SCHEMA_BEFORE_DATA(reqs) || WANT_SCHEMA_AFTER_DATA(reqs))
  			{					/* It's schema, and it's wanted */
  				impliedDataOnly = 0;
  				break;
***************
*** 195,201 ****
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dataOnly = impliedDataOnly;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
--- 195,201 ----
  		}
  		if (impliedDataOnly)
  		{
! 			ropt->dumpObjFlags = REQ_DATA;
  			ahlog(AH, 1, "implied data-only restore\n");
  		}
  	}
***************
*** 236,242 ****
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
--- 236,242 ----
  			AH->currentTE = te;
  
  			reqs = _tocEntryRequired(te, ropt, false /* needn't drop ACLs */ );
! 			if (((reqs & REQ_SCHEMA_BEFORE_DATA) != 0) && te->dropStmt)
  			{
  				/* We want the schema */
  				ahlog(AH, 1, "dropping %s %s\n", te->desc, te->tag);
***************
*** 278,284 ****
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!ropt->dataOnly && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
--- 278,284 ----
  		/* Dump any relevant dump warnings to stderr */
  		if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
  		{
! 			if (!WANT_DATA(ropt->dumpObjFlags) && te->defn != NULL && strlen(te->defn) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->defn);
  			else if (te->copyStmt != NULL && strlen(te->copyStmt) != 0)
  				write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
***************
*** 286,292 ****
  
  		defnDumped = false;
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
--- 286,293 ----
  
  		defnDumped = false;
  
! 		if ((WANT_SCHEMA_BEFORE_DATA(reqs) && WANT_SCHEMA_BEFORE_DATA(ropt->dumpObjFlags)) ||
! 			(WANT_SCHEMA_AFTER_DATA(reqs) && WANT_SCHEMA_AFTER_DATA(ropt->dumpObjFlags)))	/* We want the schema */
  		{
  			ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
  
***************
*** 331,337 ****
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if ((reqs & REQ_DATA) != 0)
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
--- 332,338 ----
  		/*
  		 * If we have a data component, then process it
  		 */
! 		if (WANT_DATA(reqs))
  		{
  			/*
  			 * hadDumper will be set if there is genuine data component for
***************
*** 343,349 ****
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && (reqs & REQ_DATA) != 0)
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
--- 344,350 ----
  				/*
  				 * If we can output the data, then restore it.
  				 */
! 				if (AH->PrintTocDataPtr !=NULL && WANT_DATA(reqs))
  				{
  #ifndef HAVE_LIBZ
  					if (AH->compression != 0)
***************
*** 415,421 ****
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if ((reqs & REQ_SCHEMA) != 0)	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
--- 416,422 ----
  		/* Work out what, if anything, we want from this entry */
  		reqs = _tocEntryRequired(te, ropt, true);
  
! 		if (WANT_SCHEMA_BEFORE_DATA(reqs))	/* We want the schema */
  		{
  			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
  				  te->desc, te->tag);
***************
*** 473,479 ****
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
--- 474,480 ----
  _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "disabling triggers for %s\n", te->tag);
***************
*** 499,505 ****
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!ropt->dataOnly || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
--- 500,506 ----
  _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt)
  {
  	/* This hack is only needed in a data-only restore */
! 	if (!WANT_DATA(ropt->dumpObjFlags) || !ropt->disable_triggers)
  		return;
  
  	ahlog(AH, 1, "enabling triggers for %s\n", te->tag);
***************
*** 1321,1327 ****
  	return NULL;
  }
  
! teReqs
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
--- 1322,1328 ----
  	return NULL;
  }
  
! int
  TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt)
  {
  	TocEntry   *te = getTocEntryByDumpId(AH, id);
***************
*** 2026,2035 ****
  					 te->defn);
  }
  
! static teReqs
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	teReqs		res = REQ_ALL;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
--- 2027,2036 ----
  					 te->defn);
  }
  
! static int
  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  {
! 	int		res = ropt->dumpObjFlags;
  
  	/* ENCODING and STDSTRINGS items are dumped specially, so always reject */
  	if (strcmp(te->desc, "ENCODING") == 0 ||
***************
*** 2109,2125 ****
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
- 	/* Mask it if we only want schema */
- 	if (ropt->schemaOnly)
- 		res = res & REQ_SCHEMA;
- 
- 	/* Mask it we only want data */
- 	if (ropt->dataOnly)
- 		res = res & REQ_DATA;
- 
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~REQ_SCHEMA;
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
--- 2110,2118 ----
  	if ((strcmp(te->desc, "<Init>") == 0) && (strcmp(te->tag, "Max OID") == 0))
  		return 0;
  
  	/* Mask it if we don't have a schema contribution */
  	if (!te->defn || strlen(te->defn) == 0)
! 		res = res & ~(REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	/* Finally, if there's a per-ID filter, limit based on that as well */
  	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.76
diff -c -r1.76 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h	7 Nov 2007 12:24:24 -0000	1.76
--- src/bin/pg_dump/pg_backup_archiver.h	24 Jul 2008 07:30:19 -0000
***************
*** 158,169 ****
  	STAGE_FINALIZING
  } ArchiverStage;
  
! typedef enum
! {
! 	REQ_SCHEMA = 1,
! 	REQ_DATA = 2,
! 	REQ_ALL = REQ_SCHEMA + REQ_DATA
! } teReqs;
  
  typedef struct _archiveHandle
  {
--- 158,173 ----
  	STAGE_FINALIZING
  } ArchiverStage;
  
! #define REQ_SCHEMA_BEFORE_DATA	(1 << 0)
! #define REQ_DATA				(1 << 1)
! #define REQ_SCHEMA_AFTER_DATA	(1 << 2)
! #define REQ_ALL					(REQ_SCHEMA_BEFORE_DATA + REQ_DATA + REQ_SCHEMA_AFTER_DATA)
! 
! #define WANT_SCHEMA_BEFORE_DATA(req)	((req & REQ_SCHEMA_BEFORE_DATA) == REQ_SCHEMA_BEFORE_DATA)
! #define WANT_DATA(req)					((req & REQ_DATA) == REQ_DATA)
! #define WANT_SCHEMA_AFTER_DATA(req)		((req & REQ_SCHEMA_AFTER_DATA) == REQ_SCHEMA_AFTER_DATA)
! #define WANT_ALL(req)					((req & REQ_ALL) == REQ_ALL)
! 
  
  typedef struct _archiveHandle
  {
***************
*** 317,323 ****
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern teReqs TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
--- 321,327 ----
  extern void ReadToc(ArchiveHandle *AH);
  extern void WriteDataChunks(ArchiveHandle *AH);
  
! extern int TocIDRequired(ArchiveHandle *AH, DumpId id, RestoreOptions *ropt);
  extern bool checkSeek(FILE *fp);
  
  #define appendStringLiteralAHX(buf,str,AH) \
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Jul 2008 18:43:30 -0000	1.497
--- src/bin/pg_dump/pg_dump.c	24 Jul 2008 07:35:28 -0000
***************
*** 73,78 ****
--- 73,82 ----
  bool		aclsSkip;
  const char *lockWaitTimeout;
  
+ /* groups of objects: default is we dump all groups */
+ 
+ int			dumpObjFlags;
+ 
  /* subquery used to convert user ID (eg, datdba) to user name */
  static const char *username_subquery;
  
***************
*** 227,232 ****
--- 231,238 ----
  	static int	disable_triggers = 0;
  	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+ 	static int	schemaBeforeData;
+ 	static int	schemaAfterData;
  
  	static struct option long_options[] = {
  		{"data-only", no_argument, NULL, 'a'},
***************
*** 267,272 ****
--- 273,280 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"lock-wait-timeout", required_argument, NULL, 2},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 420,425 ****
--- 428,437 ----
  					disable_triggers = 1;
  				else if (strcmp(optarg, "no-tablespaces") == 0)
  					outputNoTablespaces = 1;
+ 				else if (strcmp(optarg, "schema-before-data") == 0)
+ 					schemaBeforeData = 1;
+ 				else if (strcmp(optarg, "schema-after-data") == 0)
+ 					schemaAfterData = 1;
  				else if (strcmp(optarg, "use-set-session-authorization") == 0)
  					use_setsessauth = 1;
  				else
***************
*** 464,474 ****
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	if (dataOnly && schemaOnly)
  	{
! 		write_msg(NULL, "options -s/--schema-only and -a/--data-only cannot be used together\n");
  		exit(1);
  	}
  
  	if (dataOnly && outputClean)
  	{
--- 476,521 ----
  	if (optind < argc)
  		dbname = argv[optind];
  
! 	/*
! 	 * Look for conflicting options relating to object groupings
! 	 */
! 	if (schemaOnly && dataOnly)
! 	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				"-s/--schema-only", "-a/--data-only");
! 		exit(1);
! 	}
! 	else if ((schemaOnly || dataOnly) && 
! 				(schemaBeforeData || schemaAfterData))
  	{
! 		write_msg(NULL, "options %s and %s cannot be used together\n",
! 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
! 				schemaBeforeData ? "--schema-before-data" : "--schema-after-data");
  		exit(1);
  	}
+ 	else if (schemaBeforeData && schemaAfterData)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (schemaBeforeData)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (schemaAfterData)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
  
  	if (dataOnly && outputClean)
  	{
***************
*** 646,652 ****
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && !schemaOnly)
  		outputBlobs = true;
  
  	/*
--- 693,699 ----
  	 * Dumping blobs is now default unless we saw an inclusion switch or -s
  	 * ... but even if we did see one of these, -b turns it back on.
  	 */
! 	if (include_everything && WANT_DATA(dumpObjFlags))
  		outputBlobs = true;
  
  	/*
***************
*** 658,664 ****
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (!schemaOnly)
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
--- 705,711 ----
  	if (g_fout->remoteVersion < 80400)
  		guessConstraintInheritance(tblinfo, numTables);
  
! 	if (WANT_DATA(dumpObjFlags))
  		getTableData(tblinfo, numTables, oids);
  
  	if (outputBlobs && hasBlobs(g_fout))
***************
*** 712,718 ****
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && !dataOnly)
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
--- 759,765 ----
  	dumpStdStrings(g_fout);
  
  	/* The database item is always next, unless we don't want it at all */
! 	if (include_everything && WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		dumpDatabase(g_fout);
  
  	/* Now the rearrangeable objects. */
***************
*** 734,740 ****
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dataOnly = dataOnly;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
--- 781,787 ----
  		ropt->noTablespace = outputNoTablespaces;
  		ropt->disable_triggers = disable_triggers;
  		ropt->use_setsessauth = use_setsessauth;
! 		ropt->dumpObjFlags = dumpObjFlags;
  
  		if (compressLevel == -1)
  			ropt->compression = 0;
***************
*** 792,797 ****
--- 839,846 ----
  	printf(_("  --disable-dollar-quoting    disable dollar quoting, use SQL standard quoting\n"));
  	printf(_("  --disable-triggers          disable triggers during data-only restore\n"));
  	printf(_("  --no-tablespaces            do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data        dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data         dump only the part of schema after table data\n"));
  	printf(_("  --use-set-session-authorization\n"
  			 "                              use SESSION AUTHORIZATION commands instead of\n"
  	"                              ALTER OWNER commands to set ownership\n"));
***************
*** 5165,5171 ****
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with catalogId, using table */
--- 5214,5220 ----
  	int			ncomments;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with catalogId, using table */
***************
*** 5216,5222 ****
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (dataOnly)
  		return;
  
  	/* Search for comments associated with relation, using table */
--- 5265,5271 ----
  	PQExpBuffer target;
  
  	/* Comments are SCHEMA not data */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Search for comments associated with relation, using table */
***************
*** 5568,5574 ****
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || dataOnly)
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
--- 5617,5623 ----
  	char	   *qnspname;
  
  	/* Skip if not to be dumped */
! 	if (!nspinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* don't dump dummy namespace from pre-7.3 source */
***************
*** 5617,5623 ****
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || dataOnly)
  		return;
  
  	/* Dump out in proper style */
--- 5666,5672 ----
  dumpType(Archive *fout, TypeInfo *tinfo)
  {
  	/* Skip if not to be dumped */
! 	if (!tinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Dump out in proper style */
***************
*** 5646,5651 ****
--- 5695,5704 ----
  				i;
  	char	   *label;
  
+ 	/* Skip if not to be dumped */
+ 	if (!tinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
+ 		return;
+ 
  	/* Set proper schema search path so regproc references list correctly */
  	selectSourceSchema(tinfo->dobj.namespace->dobj.name);
  
***************
*** 6262,6268 ****
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 6315,6321 ----
  	PQExpBuffer q;
  
  	/* Skip if not to be dumped */
! 	if (!stinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 6309,6315 ****
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (dataOnly)
  		return false;
  	return true;
  }
--- 6362,6368 ----
  	if (!include_everything)
  		return false;
  	/* And they're schema not data */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return false;
  	return true;
  }
***************
*** 6330,6336 ****
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (dataOnly)
  		return;
  
  	/*
--- 6383,6389 ----
  	FuncInfo   *funcInfo;
  	FuncInfo   *validatorInfo = NULL;
  
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 6590,6596 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 6643,6649 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!finfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 6985,6991 ****
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (dataOnly)
  		return;
  
  	if (OidIsValid(cast->castfunc))
--- 7038,7044 ----
  	TypeInfo   *sourceInfo;
  	TypeInfo   *targetInfo;
  
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	if (OidIsValid(cast->castfunc))
***************
*** 7135,7141 ****
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7188,7194 ----
  	char	   *oprcanhash;
  
  	/* Skip if not to be dumped */
! 	if (!oprinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7519,7525 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7572,7578 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opcinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 7827,7833 ****
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 7880,7886 ----
  	int			i;
  
  	/* Skip if not to be dumped */
! 	if (!opfinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/*
***************
*** 8096,8102 ****
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8149,8155 ----
  	bool		condefault;
  
  	/* Skip if not to be dumped */
! 	if (!convinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8250,8256 ****
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 8303,8309 ----
  	bool		convertok;
  
  	/* Skip if not to be dumped */
! 	if (!agginfo->aggfn.dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 8453,8459 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8506,8512 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!prsinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8522,8528 ****
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8575,8581 ----
  	char	   *tmplname;
  
  	/* Skip if not to be dumped */
! 	if (!dictinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8607,8613 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8660,8666 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!tmplinfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8673,8679 ****
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 8726,8732 ----
  	int			i_dictname;
  
  	/* Skip if not to be dumped */
! 	if (!cfginfo->dobj.dump || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 8809,8815 ****
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (dataOnly || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
--- 8862,8868 ----
  	PQExpBuffer sql;
  
  	/* Do nothing if ACL dump is not enabled */
! 	if (!WANT_SCHEMA_BEFORE_DATA(dumpObjFlags) || aclsSkip)
  		return;
  
  	sql = createPQExpBuffer();
***************
*** 8846,8852 ****
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
--- 8899,8905 ----
  	{
  		if (tbinfo->relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
***************
*** 9153,9159 ****
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || dataOnly)
  		return;
  
  	/* Don't print inherited defaults, either */
--- 9206,9212 ----
  	PQExpBuffer delq;
  
  	/* Only print it if "separate" mode is selected */
! 	if (!tbinfo->dobj.dump || !adinfo->separate || !WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  		return;
  
  	/* Don't print inherited defaults, either */
***************
*** 9238,9244 ****
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9291,9297 ----
  	PQExpBuffer q;
  	PQExpBuffer delq;
  
! 	if (!WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9307,9313 ****
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || dataOnly)
  		return;
  
  	q = createPQExpBuffer();
--- 9360,9366 ----
  	PQExpBuffer delq;
  
  	/* Skip if not to be dumped */
! 	if (!coninfo->dobj.dump || !WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	q = createPQExpBuffer();
***************
*** 9700,9706 ****
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (!dataOnly)
  	{
  		resetPQExpBuffer(delqry);
  
--- 9753,9759 ----
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a "data" dump.
  	 */
! 	if (WANT_SCHEMA_BEFORE_DATA(dumpObjFlags))
  	{
  		resetPQExpBuffer(delqry);
  
***************
*** 9803,9809 ****
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (!schemaOnly)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9856,9862 ----
  					tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
  	}
  
! 	if (WANT_DATA(dumpObjFlags))
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************
*** 9836,9842 ****
  	const char *p;
  	int			findx;
  
! 	if (dataOnly)
  		return;
  
  	query = createPQExpBuffer();
--- 9889,9895 ----
  	const char *p;
  	int			findx;
  
! 	if (!WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	query = createPQExpBuffer();
***************
*** 10044,10050 ****
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || dataOnly)
  		return;
  
  	/*
--- 10097,10103 ----
  	PGresult   *res;
  
  	/* Skip if not to be dumped */
! 	if (!rinfo->dobj.dump || !WANT_SCHEMA_AFTER_DATA(dumpObjFlags))
  		return;
  
  	/*
Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c	13 Apr 2008 03:49:22 -0000	1.88
--- src/bin/pg_dump/pg_restore.c	24 Jul 2008 07:30:19 -0000
***************
*** 78,83 ****
--- 78,90 ----
  	static int	no_data_for_failed_tables = 0;
  	static int  outputNoTablespaces = 0;
  	static int	use_setsessauth = 0;
+  	bool		dataOnly = false;
+  	bool		schemaOnly = false;
+  
+  	static int	schemaBeforeData;
+  	static int	schemaAfterData;
+  
+  	int			dumpObjFlags;
  
  	struct option cmdopts[] = {
  		{"clean", 0, NULL, 'c'},
***************
*** 114,119 ****
--- 121,128 ----
  		{"disable-triggers", no_argument, &disable_triggers, 1},
  		{"no-data-for-failed-tables", no_argument, &no_data_for_failed_tables, 1},
  		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
+  		{"schema-before-data", no_argument, &schemaBeforeData, 1},
+  		{"schema-after-data", no_argument, &schemaAfterData, 1},
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  
  		{NULL, 0, NULL, 0}
***************
*** 145,151 ****
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				opts->dataOnly = 1;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
--- 154,160 ----
  		switch (c)
  		{
  			case 'a':			/* Dump data only */
! 				dataOnly = true;
  				break;
  			case 'c':			/* clean (i.e., drop) schema prior to create */
  				opts->dropSchema = 1;
***************
*** 213,219 ****
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				opts->schemaOnly = 1;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
--- 222,228 ----
  				opts->triggerNames = strdup(optarg);
  				break;
  			case 's':			/* dump schema only */
! 				schemaOnly = true;
  				break;
  			case 'S':			/* Superuser username */
  				if (strlen(optarg) != 0)
***************
*** 249,254 ****
--- 258,267 ----
  					no_data_for_failed_tables = 1;
  				else if (strcmp(optarg, "no-tablespaces") == 0)
  					outputNoTablespaces = 1;
+ 				else if (strcmp(optarg, "schema-before-data") == 0)
+ 					schemaBeforeData = 1;
+ 				else if (strcmp(optarg, "schema-after-data") == 0)
+ 					schemaAfterData = 1;
  				else if (strcmp(optarg, "use-set-session-authorization") == 0)
  					use_setsessauth = 1;
  				else
***************
*** 295,300 ****
--- 308,354 ----
  		opts->useDB = 1;
  	}
  
+ 	/*
+ 	 * Look for conflicting options relating to object groupings
+ 	 */
+ 	if (schemaOnly && dataOnly)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"-s/--schema-only", "-a/--data-only");
+ 		exit(1);
+ 	}
+ 	else if ((schemaOnly || dataOnly) && 
+ 				(schemaBeforeData || schemaAfterData))
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				schemaOnly ? "-s/--schema-only" : "-a/--data-only",
+ 				schemaBeforeData ? "--schema-before-data" : "--schema-after-data");
+ 		exit(1);
+ 	}
+ 	else if (schemaBeforeData && schemaAfterData)
+ 	{
+ 		write_msg(NULL, "options %s and %s cannot be used together\n",
+ 				"--schema-before-data", "--schema-after-data");
+ 		exit(1);
+ 	}
+ 
+ 	/*
+ 	 * Decide which of the object groups we will dump
+ 	 */
+ 	dumpObjFlags = REQ_ALL;
+ 
+ 	if (dataOnly)
+ 		dumpObjFlags = REQ_DATA;
+ 
+ 	if (schemaBeforeData)
+ 		dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+ 
+ 	if (schemaAfterData)
+ 		dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+ 
+ 	if (schemaOnly)
+ 		dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
+ 
  	opts->disable_triggers = disable_triggers;
  	opts->noDataForFailedTables = no_data_for_failed_tables;
  	opts->noTablespace = outputNoTablespaces;
***************
*** 405,410 ****
--- 459,466 ----
  			 "                           do not restore data of tables that could not be\n"
  			 "                           created\n"));
  	printf(_("  --no-tablespaces         do not dump tablespace assignments\n"));
+ 	printf(_("  --schema-before-data     dump only the part of schema before table data\n"));
+ 	printf(_("  --schema-after-data      dump only the part of schema after table data\n"));
  	printf(_("  --use-set-session-authorization\n"
  			 "                           use SESSION AUTHORIZATION commands instead of\n"
  			 "                           OWNER TO commands\n"));

#29

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#28)

Re: pg_dump additional options for performance

Simon Riggs <simon@2ndquadrant.com> writes:

[ pg_dump_beforeafter.v6.patch ]

I looked over this patch a bit. I have a proposal for a slightly
different way of defining the new switches:

* --schema-before-data, --data-only, and --schema-after-data can be
specified in any combination to obtain any subset of the full dump.
If none are specified (which would in itself be a useless combination)
then the default is to dump all three sections, just as if all three
were specified.

* --schema-only is defined as equivalent to specifying both
--schema-before-data and --schema-after-data.

The patch as submitted enforces what seem largely arbitrary restrictions
on combining these switches. It made some sense before to treat
specifying both --schema-only and --data-only as an error, but it's not
clear to me why you shouldn't be able to write both --schema-before-data
and --schema-after-data, especially when there's a switch right beside
them that appears to be equivalent to that combination. So let's just
allow all the combinations.

The attached updated patch implements and documents this behavior,
and gets rid of the special linkage between --disable-triggers and
--data-only as previously discussed.

Unfortunately there's still a lot of work to do, and I don't feel like
doing it so I'm bouncing this patch back for further work.

The key problem is that pg_restore is broken: it emits nearly the same
output for --schema-before-data and --schema-after-data, because it
doesn't have any way to distinguish which objects in a full dump file
belong where. This is because the filtering logic was put in the wrong
place, namely in the ArchiveEntry creation routines in pg_dump.c, when
where it really needs to happen is while scanning the TocEntry list in
RestoreArchive(). (Note: it is perhaps worth keeping the pg_dump.c
filters so as to avoid doing extra server queries for objects that we
aren't going to dump anyway, but the core logic has to be in
RestoreArchive.)

Another issue is that the rules for deciding which objects are "before
data" and which are "after data" are wrong. In particular ACLs are after
data not before data, which is relatively easy to fix. Not so easy to fix
is that COMMENTs might be either before or after data depending on what
kind of object they are attached to.

(BTW, what about BLOB COMMENTS? They definitely can't be "before data".
ISTM you could make a case for them being "after data", if you think that
comments are always schema. But there is also a case for considering
them as data, because the objects they are attached to are data. I kind
of like the latter approach because it would create an invariant that
comments appear in the same dump section as the object commented on.
Thoughts?)

Implementing the filtering by examining the type of a TocEntry in
RestoreArchive is a bit of a PITA, but it's probably possible. The
main bad thing about that is the need to make an explicit list of every
type of TocEntry that exists now or ever has been emitted by any past
version of pg_dump. The design concept was that the type tags are
mainly documentation, and while we've had to bend that in places (mostly
for backward-compatibility reasons) this would be the first place we'd
have to throw it overboard completely.

And there's yet another issue here, which is that it's not entirely clear
that the type of an object uniquely determines whether it's before or
after data. This might be an emergent property of the object sorting
rules, but there is certainly not anything positively guaranteeing that
the dependency-driven topological sort will produce such a result, and
especially not that that will always be true in the future. So the
approach seems a bit fragile.

We could perhaps get rid of that problem, as well as the need to implement
object-type-determination logic, if we were to make RestoreArchive define
the groupings according to position in the TocEntry list: everything
before the first TABLE DATA or BLOB (and BLOB COMMENT?) entry is "before"
data, everything after the last one is "after" data, everything in between
is data. Then we only need to identify object types that are considered
"data", which we already have a rule for (whether hadDumper is true).
This is pretty attractive until you stop to consider the possibility
that there aren't any data entries in an archive (ie, it was made with
--schema-only): then there's no way to identify the boundary points.

We could solve that problem by inserting a "dummy data" TOC entry where
the data would have appeared, but this will only work in new archive
files. With an implementation like this, pg_restore with
--schema-before-data or --schema-after-data won't behave very nicely on a
pre-8.4 --schema-only archive file. (Presumably it would act as though
all the objects were "before" data.) Is that a small enough corner case
to live with in order to gain implementation simplicity and robustness?

BTW, another incomplete item is that pg_dumpall should probably be taught
to accept and pass down --schema-before-data and --schema-after-data
switches.

regards, tom lane

#30

Greg Sabino Mullane

greg@turnstep.com

over 17 years ago

In reply to: Tom Lane (#29)

Re: pg_dump additional options for performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Tom Lane wrote:

* --schema-before-data, --data-only, and --schema-after-data can be

I thought you were arguing for some better names at one point? Those seem
very confusing to me, especially "--schema-after-data". I know it means
"the parts of the schema that come after the data" but it could
also be read as other ways, including "put the schema after the data" - which
makes no sense, but the name is not exactly intuitive either. "Pre" and "Post"
at least are slightly better, IMO. How about --pre-data-schema
and --post-data-schema? Or --pre-data-section and --post-data-section?
Or (my favorites) --pre-data-commands and --post-data-commands? As the
existing docs say, "commands" are what we are generating.

them as data, because the objects they are attached to are data. I kind
of like the latter approach because it would create an invariant that
comments appear in the same dump section as the object commented on.
Thoughts?)

+1 on putting them next to the object commented on.

And there's yet another issue here, which is that it's not entirely clear
that the type of an object uniquely determines whether it's before or
after data.

Wouldn't that be a problem with current dumps as well then?

We could solve that problem by inserting a "dummy data" TOC entry where
the data would have appeared, but this will only work in new archive
files. With an implementation like this, pg_restore with
--schema-before-data or --schema-after-data won't behave very nicely on a
pre-8.4 --schema-only archive file. (Presumably it would act as though
all the objects were "before" data.) Is that a small enough corner case
to live with in order to gain implementation simplicity and robustness?

I'm not comfortable with corner cases for pg_restore backwards compatibility.
What exactly would happen (worse case) in that scenario?

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200807252235
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkiKjgMACgkQvJuQZxSWSsiRMACg7c/VDo9hTTjukkFFvLYI31mL
BqkAn3FfepllvVnIwX+efA5cLPlVbDd0
=V/Sv
-----END PGP SIGNATURE-----

#31

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#29)

Re: pg_dump additional options for performance

On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

[ pg_dump_beforeafter.v6.patch ]

Unfortunately there's still a lot of work to do, and I don't feel like
doing it so I'm bouncing this patch back for further work.

Fair enough. Thanks for the review.

The key problem is that pg_restore is broken: it emits nearly the same
output for --schema-before-data and --schema-after-data, because it
doesn't have any way to distinguish which objects in a full dump file
belong where. This is because the filtering logic was put in the wrong
place, namely in the ArchiveEntry creation routines in pg_dump.c, when
where it really needs to happen is while scanning the TocEntry list in
RestoreArchive(). (Note: it is perhaps worth keeping the pg_dump.c
filters so as to avoid doing extra server queries for objects that we
aren't going to dump anyway, but the core logic has to be in
RestoreArchive.)

My feeling is that this would take the patch off-track.

The key capability here is being able to split the dump into multiple
pieces. The equivalent capability on restore is *not* required, because
once the dump has been split the restore never needs to be. It might
seem that the patch should be symmetrical with respect to pg_dump and
pg_restore, but I see no use case for the pg_restore case.

The title of this email confirms that as original intention.

I looked over this patch a bit. I have a proposal for a slightly
different way of defining the new switches:

* --schema-before-data, --data-only, and --schema-after-data can be
specified in any combination to obtain any subset of the full dump.
If none are specified (which would in itself be a useless combination)
then the default is to dump all three sections, just as if all three
were specified.

* --schema-only is defined as equivalent to specifying both
--schema-before-data and --schema-after-data.

The patch as submitted enforces what seem largely arbitrary restrictions
on combining these switches. It made some sense before to treat
specifying both --schema-only and --data-only as an error, but it's not
clear to me why you shouldn't be able to write both --schema-before-data
and --schema-after-data, especially when there's a switch right beside
them that appears to be equivalent to that combination. So let's just
allow all the combinations.

I had it both ways at various points in development. I'm happy with what
you propose.

The attached updated patch implements and documents this behavior,
and gets rid of the special linkage between --disable-triggers and
--data-only as previously discussed.

Another issue is that the rules for deciding which objects are "before
data" and which are "after data" are wrong. In particular ACLs are after
data not before data, which is relatively easy to fix.

Not so easy to fix
is that COMMENTs might be either before or after data depending on what
kind of object they are attached to.

Is there anything to fix? Comments are added by calls to dumpComment,
which are always made in conjunction with the dump of an object. So if
you dump the object you dump the comment. As long as objects are
correctly split out then comments will be also.

(BTW, what about BLOB COMMENTS? They definitely can't be "before data".
ISTM you could make a case for them being "after data", if you think that
comments are always schema. But there is also a case for considering
them as data, because the objects they are attached to are data. I kind
of like the latter approach because it would create an invariant that
comments appear in the same dump section as the object commented on.
Thoughts?)

Yes, data. I'll look at this.

Implementing the filtering by examining the type of a TocEntry in
RestoreArchive is a bit of a PITA, but it's probably possible. The
main bad thing about that is the need to make an explicit list of every
type of TocEntry that exists now or ever has been emitted by any past
version of pg_dump. The design concept was that the type tags are
mainly documentation, and while we've had to bend that in places (mostly
for backward-compatibility reasons) this would be the first place we'd
have to throw it overboard completely.

And there's yet another issue here, which is that it's not entirely clear
that the type of an object uniquely determines whether it's before or
after data. This might be an emergent property of the object sorting
rules, but there is certainly not anything positively guaranteeing that
the dependency-driven topological sort will produce such a result, and
especially not that that will always be true in the future. So the
approach seems a bit fragile.

Don't understand that. Objects are sorted in well-defined order,
specified in pg_dump_sort.c. Essentially we are saying that (according
to current numbering)

--schema-before-data priority 1-8
--data-only priority 9-11
--schema-after-data priority 12+

So the sort is explicitly defined, not implicit. I can add comments to
ensure that people changing the priority of objects across those
boundaries would be causing problems.

We could perhaps get rid of that problem, as well as the need to implement
object-type-determination logic, if we were to make RestoreArchive define
the groupings according to position in the TocEntry list: everything
before the first TABLE DATA or BLOB (and BLOB COMMENT?) entry is "before"
data, everything after the last one is "after" data, everything in between
is data. Then we only need to identify object types that are considered
"data", which we already have a rule for (whether hadDumper is true).
This is pretty attractive until you stop to consider the possibility
that there aren't any data entries in an archive (ie, it was made with
--schema-only): then there's no way to identify the boundary points.

We could solve that problem by inserting a "dummy data" TOC entry where
the data would have appeared, but this will only work in new archive
files. With an implementation like this, pg_restore with
--schema-before-data or --schema-after-data won't behave very nicely on a
pre-8.4 --schema-only archive file. (Presumably it would act as though
all the objects were "before" data.) Is that a small enough corner case
to live with in order to gain implementation simplicity and robustness?

All of the above makes me certain I want to remove these options from
pg_restore.

BTW, another incomplete item is that pg_dumpall should probably be taught
to accept and pass down --schema-before-data and --schema-after-data
switches.

I'm conscious that the major work proposed will take weeks to complete
and we don't know what other problems it will cause (but I'm pretty
certain it will cause some). With regard to the use case, I see little
or no benefit from either of us doing that and regret I won't be able to
complete that.

Can we prune down to the base use case to avoid this overhead? i.e. have
these options on pg_dump only?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#32

sfrost@snowman.net

over 17 years ago

In reply to: Simon Riggs (#31)

Re: pg_dump additional options for performance

* Simon Riggs (simon@2ndquadrant.com) wrote:

The key capability here is being able to split the dump into multiple
pieces. The equivalent capability on restore is *not* required, because
once the dump has been split the restore never needs to be. It might
seem that the patch should be symmetrical with respect to pg_dump and
pg_restore, but I see no use case for the pg_restore case.

I'm inclined to agree with this. It might have been nice to provide a
way to split out already-created dumps, but I suspect that people who
need that probably have already figured out a way to do it (I know I
have..). We should probably ensure that pg_restore doesn't *break* when
fed a partial dump.

The patch as submitted enforces what seem largely arbitrary restrictions
on combining these switches.

I had it both ways at various points in development. I'm happy with what
you propose.

I agree with removing the restrictions. I don't see much in the way of
use cases, but it reduces code and doesn't cause problems.

Another issue is that the rules for deciding which objects are "before
data" and which are "after data" are wrong. In particular ACLs are after
data not before data, which is relatively easy to fix.

OK

This was partially why I was complaining about having documentation, and
a policy for that matter, which goes into more detail about why X is before
or after the data. I agree that they're after today, but I don't see
any particular reason why they should be one or the other. If we
adopted a policy like I proposed (--schema-post-data is essentially that
which uses the data and is faster done in bulk) then ACLs would be
before, and I tend to feel like it makes more sense that way.

Not so easy to fix
is that COMMENTs might be either before or after data depending on what
kind of object they are attached to.

Is there anything to fix? Comments are added by calls to dumpComment,
which are always made in conjunction with the dump of an object. So if
you dump the object you dump the comment. As long as objects are
correctly split out then comments will be also.

I agree with this, and it follows for BLOB comments- in any case, they
go with the object being dumped at the time of that object getting
dumped. Comments make sense as an extention of the object, not as a
seperate set of objects to be explicitly placed before or after the
data.

All of the above makes me certain I want to remove these options from
pg_restore.

I'm in agreement with this.

BTW, another incomplete item is that pg_dumpall should probably be taught
to accept and pass down --schema-before-data and --schema-after-data
switches.

OK

I could go either way on this.

Can we prune down to the base use case to avoid this overhead? i.e. have
these options on pg_dump only?

Makes sense to me.

Thanks,

Stephen

#33

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#31)

Re: pg_dump additional options for performance

Simon Riggs <simon@2ndquadrant.com> writes:

On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:

The key problem is that pg_restore is broken:

The key capability here is being able to split the dump into multiple
pieces. The equivalent capability on restore is *not* required, because
once the dump has been split the restore never needs to be.

This argument is nonsense. The typical usage of this capability, IMHO,
will be

pg_dump -Fc >whole.dump
pg_restore --schema-before-data whole.dump >before.sql
pg_restore --data-only whole.dump >data.sql
pg_restore --schema-after-data whole.dump >after.sql

followed by editing the schema pieces and then loading. One reason
is that this gives you a consistent dump, whereas three successive
pg_dump runs could never guarantee any such thing. Another reason
is that you may well not know when you prepare the dump that you
will need split output, because the requirement to edit the dump
is likely to be realized only when you go to load it.

In any case, why did you put the switches into pg_restore.c if you
thought it wasn't useful for pg_restore to handle them?

Not so easy to fix
is that COMMENTs might be either before or after data depending on what
kind of object they are attached to.

Is there anything to fix?

Well, yeah. If you attach a comment to an after-data object and test
--schema-after-data, you'll notice the comment is lost.

And there's yet another issue here, which is that it's not entirely clear
that the type of an object uniquely determines whether it's before or
after data.

Don't understand that. Objects are sorted in well-defined order,
specified in pg_dump_sort.

After which we do a topological sort that enforces dependency ordering.
The question to worry about is whether there can ever be a dependency
from a normally-"before" object to a normally-"after" object, which
would cause the dependency sort to move the latter in front of the
former (in one way or another). I'm not saying that any such case can
occur today, but I don't think it's an impossibility for it to arise in
future. I don't want this relatively minor feature to be putting limits
on what kinds of dependencies the system can have.

I'm conscious that the major work proposed will take weeks to complete

I don't think that what I am proposing is that complicated; I would
anticipate it requiring somewhere on the order of two dozen lines of
code. I was thinking of doing a preliminary loop through the TocEntry
list to identify the ordinal numbers of the first and last data items,
and then the main loop could compare a counter to those numbers to
decide which of the three sections it was in. Plus you'd need another
ArchiveEntry call someplace to prepare the "dummy data" item if one was
needed.

regards, tom lane

#34

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Stephen Frost (#32)

Re: pg_dump additional options for performance

Stephen Frost <sfrost@snowman.net> writes:

Another issue is that the rules for deciding which objects are "before
data" and which are "after data" are wrong. In particular ACLs are after
data not before data, which is relatively easy to fix.

OK

This was partially why I was complaining about having documentation, and
a policy for that matter, which goes into more detail about why X is before
or after the data. I agree that they're after today, but I don't see
any particular reason why they should be one or the other.

If a table's ACL revokes owner insert privilege, and was placed before
the data load steps, those steps would fail. We are relying on the
default table privileges until we are done doing everything we need to
do to the tables (and perhaps other objects, I'm not sure if there are
any other comparable problems).

regards, tom lane

#35

sfrost@snowman.net

over 17 years ago

In reply to: Tom Lane (#33)

Re: pg_dump additional options for performance

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:

The key problem is that pg_restore is broken:

The key capability here is being able to split the dump into multiple
pieces. The equivalent capability on restore is *not* required, because
once the dump has been split the restore never needs to be.

This argument is nonsense. The typical usage of this capability, IMHO,
will be

pg_dump -Fc >whole.dump
pg_restore --schema-before-data whole.dump >before.sql
pg_restore --data-only whole.dump >data.sql
pg_restore --schema-after-data whole.dump >after.sql

followed by editing the schema pieces and then loading.

I dislike, and doubt that I'd use, this approach. At the end of the
day, it ends up processing the same (very large amount of data) multiple
times. We have >60G dump files sometimes, and there's no way I'm going
to dump that into a single file first if I can avoid it. What we end up
doing today is --schema-only followed by vi'ing it and splitting it up
by hand, etc, then doing a seperate --data-only dump.

One reason
is that this gives you a consistent dump, whereas three successive
pg_dump runs could never guarantee any such thing.

While this is technically true, in most cases people have control over
the schema bits and would likely be able to ensure that the schema
doesn't change during the time. At that point it's only the data, which
is still done in a transactional way.

Another reason is that you may well not know when you prepare the
dump that you will need split output, because the requirement to edit
the dump is likely to be realized only when you go to load it.

This is a good point. My gut reaction is that, at least in my usage, it
would be more about "if it's larger than a gig, I might as well split it
out, just in case I need to touch something". Honestly, it's rare that
I don't have to make *some* change. Often that's the point of dumping
it out.

Thanks,

Stephen

#36

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#33)

Re: pg_dump additional options for performance

On Sat, 2008-07-26 at 12:20 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:

The key problem is that pg_restore is broken:

The key capability here is being able to split the dump into multiple
pieces. The equivalent capability on restore is *not* required, because
once the dump has been split the restore never needs to be.

This argument is nonsense.
The typical usage of this capability, IMHO, will be

Arghh! That's not my stated use case!?#*!

I want to dump tables separately for performance reasons. There are
documented tests showing 100% gains using this method. There is no gain
adding this to pg_restore. There is a gain to be had - parallelising
index creation, but this patch doesn't provide parallelisation.

Anyway, clearly time for me to stop and have a break.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#37

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Stephen Frost (#35)

Re: pg_dump additional options for performance

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

This argument is nonsense. The typical usage of this capability, IMHO,
will be

pg_dump -Fc >whole.dump
pg_restore --schema-before-data whole.dump >before.sql
pg_restore --data-only whole.dump >data.sql
pg_restore --schema-after-data whole.dump >after.sql

followed by editing the schema pieces and then loading.

I dislike, and doubt that I'd use, this approach. At the end of the
day, it ends up processing the same (very large amount of data) multiple
times.

Well, that's easily avoided: just replace the third step by restoring
directly to the target database.

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
pg_restore --schema-after-data whole.dump >after.sql
edit after.sql
psql -f before.sql target_db
pg_restore --data-only -d target_db whole.dump
psql -f after.sql target_db

regards, tom lane

#38

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#36)

Re: pg_dump additional options for performance

Simon Riggs <simon@2ndquadrant.com> writes:

I want to dump tables separately for performance reasons. There are
documented tests showing 100% gains using this method. There is no gain
adding this to pg_restore. There is a gain to be had - parallelising
index creation, but this patch doesn't provide parallelisation.

Right, but the parallelization is going to happen sometime, and it is
going to happen in the context of pg_restore. So I think it's pretty
silly to argue that no one will ever want this feature to work in
pg_restore.

To extend the example I just gave to Stephen, I think a fairly probable
scenario is where you only need to tweak some "before" object
definitions, and then you could do

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
psql -f before.sql target_db
pg_restore --data-only --schema-after-data -d target_db whole.dump

which (given a parallelizing pg_restore) would do all the time-consuming
steps in a fully parallelized fashion.

regards, tom lane

#39

jd@commandprompt.com

over 17 years ago

In reply to: Tom Lane (#37)

Re: pg_dump additional options for performance

On Sat, 2008-07-26 at 13:43 -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

I dislike, and doubt that I'd use, this approach. At the end of the
day, it ends up processing the same (very large amount of data) multiple
times.

Well, that's easily avoided: just replace the third step by restoring
directly to the target database.

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
pg_restore --schema-after-data whole.dump >after.sql
edit after.sql
psql -f before.sql target_db
pg_restore --data-only -d target_db whole.dump
psql -f after.sql target_db

It seems to me we continue to hack a solution without a clear idea of
the problems involved. There are a number of what I would consider
significant issues with the backup / restore facilities as a whole with
PostgreSQL.

1. We use text based backups, even with custom format. We need a fast
binary representation as well.

2. We have no concurrency which means, anyone with any database over 50G
has unacceptable restore times.

3. We have to continue develop hacks to define custom utilization. Why
am I passing pre-data anything? It should be automatic. For example:

pg_backup (not dump, we aren't dumping. Dumping is usually associated
with some sort of crash or fould human behavoir. We are backing up).
pg_backup -U <user> -D database -F -f mybackup.sqlc

If I were to extract <mybackup.sqlc> I would get:

mybackup.datatypes
mybackup.tables
mybackup.data
mybackup.primary_keys
mybackup.indexes
mybackup.constraints
mybackup.grants

All would be the SQL representation.

Further I could do this:

pg_restore -U <user> -D <database> --data-types -f mybackup.sqlc

Which would restore just the SQL representation of the data types.

Or:

pg_restore -U <user> -D <database> --tables -f mybackup.sqlc

Which would restore *only* the tables. Yes it would error if I didn't
also specify --data-types.

Further we need to have concurrency capability. Once we have restored
datatypes and tables, there is zero reason not to launch connections on
data (and then primary keys and indexes) so:

pg_restore -U <user> -D <database> -C 4 --full -f mybackup.sqlc

Which would launch four connections to the database, and perform a full
restore per mybackup.sqlc.

Oh and pg_dumpall? It should have been removed right around the release
of 7.2, pg_dump -A please.

Anyway, I leave other peeps to flame me into oblivion.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#40

daveg

daveg@sonic.net

over 17 years ago

In reply to: Tom Lane (#38)

Re: pg_dump additional options for performance

On Sat, Jul 26, 2008 at 01:56:14PM -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I want to dump tables separately for performance reasons. There are
documented tests showing 100% gains using this method. There is no gain
adding this to pg_restore. There is a gain to be had - parallelising
index creation, but this patch doesn't provide parallelisation.

Right, but the parallelization is going to happen sometime, and it is
going to happen in the context of pg_restore. So I think it's pretty
silly to argue that no one will ever want this feature to work in
pg_restore.

To extend the example I just gave to Stephen, I think a fairly probable
scenario is where you only need to tweak some "before" object
definitions, and then you could do

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
psql -f before.sql target_db
pg_restore --data-only --schema-after-data -d target_db whole.dump

which (given a parallelizing pg_restore) would do all the time-consuming
steps in a fully parallelized fashion.

A few thoughts about pg_restore performance:

To take advantage of non-logged copy, the create and load should be in
the same transaction.

To take advantage of file and buffer cache, it would be be good to do
indexes immediately after table data. Many tables will be small enough
to fit in cache and this will avoid re-reading them for index builds. This
effect becomes stronger with more indexes on one table. There may also be
some filesytem placement benefit to building the indexes for a table
immediately after loading the data.

The buffer fan file cache advantage also applies to constraint creation,
but this is complicated by the need for indexes and data in the referenced
tables.

It seems that a high performance restore will want to proced in a different
order than the current sort order or that proposed by the before/data/after
patch.

- The simplest unit of work for parallelism may be the table and its
"decorations", eg indexes and relational constraints.

- Sort tables by foreign key dependency so that referenced tables are
loaded before referencing tables.

- Do table creation and data load together in one transaction to use
non-logged copy. Index builds, and constraint creation should follow
immediately, either as part of the same transaction, or possibly
parallelized themselves.

Table creation, data load, index builds, and constraint creation could
be packaged up as the unit of work to be done in a subprocess which either
completes or fails as a unit. The worker process would be called with
connection info, a file pointer to the data, and the DDL for the table.
pg_restore would keep a work queue of tables to be restored in FK dependency
order and also do the other schema operations such as functions and types.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#41

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#38)

Re: pg_dump additional options for performance

On Sat, 2008-07-26 at 13:56 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I want to dump tables separately for performance reasons. There are
documented tests showing 100% gains using this method. There is no gain
adding this to pg_restore. There is a gain to be had - parallelising
index creation, but this patch doesn't provide parallelisation.

Right, but the parallelization is going to happen sometime, and it is
going to happen in the context of pg_restore.

I honestly think there is less benefit that way than if we consider
things more as a whole:

To do data dump quickly we need to dump different tables to different
disks simultaneously. By its very nature, that cannot end with just a
single file. So the starting point for any restore must be potentially
more than one file.

There are two ways of dumping: either multi-thread pg_dump, or allow
multiple pg_dumps to work together. Second option much less work, same
result. (Either way we also need a way for multiple concurrent sessions
to share a snapshot.)

When restoring, we can then just use multiple pg_restore sessions to
restore the individual data files. Or again we can write a
multi-threaded pg_restore to do the same thing - why would I bother
doing that when I already can? It gains us nothing.

Parallelising the index creation seems best done using concurrent psql.
We've agreed some mods to psql to put multi-sessions in there. If we do
that right, then we can make pg_restore generate a psql script with
multi-session commands scattered appropriately throughout.

Parallel pg_restore is a lot of work for a narrow use case. Concurrent
psql provides a much wider set of use cases.

So fully parallelising dump/restore can be achieved by

* splitting dump into pieces (this patch)
* allowing sessions to share a common snapshot
* concurrent psql
* changes to pg_restore/psql/pg_dump to allow commands to be inserted
which will use concurrent psql features

If we do things this way then we have some useful tools that can be used
in a range of use cases, not just restore.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#42

simon@2ndquadrant.com

over 17 years ago

In reply to: Joshua D. Drake (#39)

Re: pg_dump additional options for performance

On Sat, 2008-07-26 at 11:03 -0700, Joshua D. Drake wrote:

2. We have no concurrency which means, anyone with any database over 50G
has unacceptable restore times.

Agreed.

Also the core reason for wanting -w

3. We have to continue develop hacks to define custom utilization. Why
am I passing pre-data anything? It should be automatic. For example:

pg_backup (not dump, we aren't dumping. Dumping is usually associated
with some sort of crash or fould human behavoir. We are backing up).
pg_backup -U <user> -D database -F -f mybackup.sqlc

If I were to extract <mybackup.sqlc> I would get:

mybackup.datatypes
mybackup.tables
mybackup.data
mybackup.primary_keys
mybackup.indexes
mybackup.constraints
mybackup.grants

Sounds good.

Doesn't help with the main element of dump time: one table at a time to
one output file. We need a way to dump multiple tables concurrently,
ending in multiple files/filesystems.

Oh and pg_dumpall? It should have been removed right around the release
of 7.2, pg_dump -A please.

Good idea

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#43

jd@commandprompt.com

over 17 years ago

In reply to: Simon Riggs (#42)

Re: pg_dump additional options for performance

Simon Riggs wrote:

On Sat, 2008-07-26 at 11:03 -0700, Joshua D. Drake wrote:

2. We have no concurrency which means, anyone with any database over 50G
has unacceptable restore times.

Agreed.

Sounds good.

Doesn't help with the main element of dump time: one table at a time to
one output file. We need a way to dump multiple tables concurrently,
ending in multiple files/filesystems.

Agreed but that is a problem I understand with a solution I don't. I am
all eyes on a way to fix that. One thought I had and please, be gentle
in response was some sort of async transaction capability. I know that
libpq has the ability to send async queries. Is it possible to do this:

send async(copy table to foo)
send async(copy table to bar)
send async(copy table to baz)

Where all three copies are happening in the background?

Sincerely,

Joshua D. Drake

#44

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Joshua D. Drake (#43)

Re: pg_dump additional options for performance

Joshua D. Drake wrote:

Agreed but that is a problem I understand with a solution I don't. I
am all eyes on a way to fix that. One thought I had and please, be
gentle in response was some sort of async transaction capability. I
know that libpq has the ability to send async queries. Is it possible
to do this:

send async(copy table to foo)
send async(copy table to bar)
send async(copy table to baz)

Where all three copies are happening in the background?

IIRC, libpq doesn't let you have more than one async query active at one
time.

cheers

andrew

#45

jd@commandprompt.com

over 17 years ago

In reply to: Andrew Dunstan (#44)

Re: pg_dump additional options for performance

Andrew Dunstan wrote:

Joshua D. Drake wrote:

Agreed but that is a problem I understand with a solution I don't. I
am all eyes on a way to fix that. One thought I had and please, be
gentle in response was some sort of async transaction capability. I
know that libpq has the ability to send async queries. Is it possible
to do this:

send async(copy table to foo)
send async(copy table to bar)
send async(copy table to baz)

Where all three copies are happening in the background?

IIRC, libpq doesn't let you have more than one async query active at one
time.

Now that I think on it harder, this isn't even a libpq problem (although
its involved), we need the postmaster do be able to do a background
async query. Which is (I am guessing) why libpq can only do one at a time.

Sincerely,

Joshua D. Drake

#46

sfrost@snowman.net

over 17 years ago

In reply to: Tom Lane (#37)

Re: pg_dump additional options for performance

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

I dislike, and doubt that I'd use, this approach. At the end of the
day, it ends up processing the same (very large amount of data) multiple
times.

Well, that's easily avoided: just replace the third step by restoring
directly to the target database.

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
pg_restore --schema-after-data whole.dump >after.sql
edit after.sql
psql -f before.sql target_db
pg_restore --data-only -d target_db whole.dump
psql -f after.sql target_db

This would depend on the dump being in the custom format, though I
suppose that ends up being true for any usage of these options. I've
never really been a fan of the custom format, in large part because it
doesn't really buy you all that much and makes changing things more
difficult (by having to extract out what you want to change, and then
omit it from the restore).

I can see some advantage to having the entire dump contained in a single
file and still being able to pull out pieces based on before/after.
Should we get a binary format which is much faster, I could see myself
being more likely to use pg_restore. Same for parallelization or, in my
fantasies, the ability to copy schema, tables, indexes, etc, in 'raw' PG
format between servers. Worse than having to vi an insanely large file,
or split it up to be able to modify the pieces you want, is having to
rebuild indexes, especially GIST ones. That's another topic though.

Thanks,

Stephen

#47

sfrost@snowman.net

over 17 years ago

In reply to: Tom Lane (#38)

Re: pg_dump additional options for performance

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Right, but the parallelization is going to happen sometime, and it is
going to happen in the context of pg_restore. So I think it's pretty
silly to argue that no one will ever want this feature to work in
pg_restore.

I think you've about convinced me on this, and it annoys me. ;) Worse
is that it sounds like this might cause the options to not make it in
for 8.4, which would be quite frustrating.

To extend the example I just gave to Stephen, I think a fairly probable
scenario is where you only need to tweak some "before" object
definitions, and then you could do

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
psql -f before.sql target_db
pg_restore --data-only --schema-after-data -d target_db whole.dump

which (given a parallelizing pg_restore) would do all the time-consuming
steps in a fully parallelized fashion.

Alright, this has been mulling around in the back of my head a bit and
has now finally surfaced- I like having the whole dump contained in a
single file, but I hate having what ends up being "out-dated" or "wrong"
or "not what was loaded" in the dump file. Doesn't seem likely to be
possible, but it'd be neat to be able to modify objects in the dump
file.

Also, something which often happens to me is that I need to change the
search_path or the role at the top of a .sql from pg_dump before
restoring it. Seems like using the custom format would make that
difficult without some pipe/cat/sed magic. Parallelization would make
using that kind of magic more difficult too, I would guess. Might be
something to think about.

Thanks,

Stephen

#48

daveg

daveg@sonic.net

over 17 years ago

In reply to: Simon Riggs (#42)

Re: [PATCHES] pg_dump additional options for performance

On Sun, Jul 27, 2008 at 10:37:34AM +0100, Simon Riggs wrote:

On Sat, 2008-07-26 at 11:03 -0700, Joshua D. Drake wrote:

2. We have no concurrency which means, anyone with any database over 50G
has unacceptable restore times.

Agreed.

Also the core reason for wanting -w

3. We have to continue develop hacks to define custom utilization. Why
am I passing pre-data anything? It should be automatic. For example:

[adding hackers for discussion]

On Sat, Jul 26, 2008 at 01:56:14PM -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I want to dump tables separately for performance reasons. There are
documented tests showing 100% gains using this method. There is no gain
adding this to pg_restore. There is a gain to be had - parallelising
index creation, but this patch doesn't provide parallelisation.

Right, but the parallelization is going to happen sometime, and it is
going to happen in the context of pg_restore. So I think it's pretty
silly to argue that no one will ever want this feature to work in
pg_restore.

To extend the example I just gave to Stephen, I think a fairly probable
scenario is where you only need to tweak some "before" object
definitions, and then you could do

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
psql -f before.sql target_db
pg_restore --data-only --schema-after-data -d target_db whole.dump

which (given a parallelizing pg_restore) would do all the time-consuming
steps in a fully parallelized fashion.

A few thoughts about pg_restore performance:

To take advantage of non-logged copy, the table create and data load should
be in the same transaction.

To take advantage of file and buffer cache, it would be better to create
indexes immediately after loading table data. Many tables will be small
enough to fit in cache on and this will avoid re-reading them for index
builds. This is more advantagious with more indexes on one table. There
may also be some filesytem placement benefits to building the indexes for
a table immediately after loading the data.

Creating constraints immediately after loading data also would benefit from
warm buffer and file caches. Doing this this is complicated by the need
for indexes and data in the referenced tables to exist first.

It seems that a high performance restore will want to procede in a different
order than the current sort order or that proposed by the before/data/after
patch.

- The simplest unit of work for parallelism may be the table and its
"decorations", eg indexes and relational constraints.

- Sort tables by foreign key dependency so that referenced tables are
loaded before referencing tables.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#49

jd@commandprompt.com

over 17 years ago

In reply to: Stephen Frost (#46)

Re: pg_dump additional options for performance

Stephen Frost wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

I dislike, and doubt that I'd use, this approach. At the end of the
day, it ends up processing the same (very large amount of data) multiple
times.

This would depend on the dump being in the custom format, though I
suppose that ends up being true for any usage of these options. I've
never really been a fan of the custom format, in large part because it
doesn't really buy you all that much and makes changing things more
difficult (by having to extract out what you want to change, and then
omit it from the restore).

Custom format rocks for partial set restores from a whole dump. See the
TOC option :)

Joshua D. Drake

#50

sfrost@snowman.net

over 17 years ago

In reply to: Joshua D. Drake (#49)

Re: pg_dump additional options for performance

* Joshua D. Drake (jd@commandprompt.com) wrote:

Custom format rocks for partial set restores from a whole dump. See the
TOC option :)

I imagine it does, but that's very rarely what I need. Most of the time
we're dumping out a schema to load it into a seperate schema (usually on
another host). Sometimes that can be done by simply vi'ing the file to
change the search_path and whatnot, though more often we end up pipe'ing
the whole thing through sed. Since we don't allow regular users to do
much, and you have to 'set role postgres;' to do anything as superuser,
we also often end up adding 'set role postgres;' to the top of the .sql
files.

Thanks,

Stephen

#51

chris

cbbrowne@ca.afilias.info

over 17 years ago

In reply to: Stephen Frost (#15)

Re: pg_dump additional options for performance

tgl@sss.pgh.pa.us (Tom Lane) writes:

Simon Riggs <simon@2ndquadrant.com> writes:

I want to dump tables separately for performance reasons. There are
documented tests showing 100% gains using this method. There is no gain
adding this to pg_restore. There is a gain to be had - parallelising
index creation, but this patch doesn't provide parallelisation.

Right, but the parallelization is going to happen sometime, and it is
going to happen in the context of pg_restore. So I think it's pretty
silly to argue that no one will ever want this feature to work in
pg_restore.

"Never" is a long time, agreed.

To extend the example I just gave to Stephen, I think a fairly probable
scenario is where you only need to tweak some "before" object
definitions, and then you could do

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
psql -f before.sql target_db
pg_restore --data-only --schema-after-data -d target_db whole.dump

which (given a parallelizing pg_restore) would do all the time-consuming
steps in a fully parallelized fashion.

Do we need to wait until a fully-parallelizing pg_restore is
implemented before adding this functionality to pg_dump?

The particular extension I'm interested in from pg_dump, here, is the
ability to dump multiple tables concurrently. I've got disk arrays
with enough I/O bandwidth that this form of parallelization does
provide a performance benefit.

The result of that will be that *many* files are generated, and I
don't imagine we want to change pg_restore to try to make it read from
multiple files concurrently.

Further, it's actually not obvious that we *necessarily* care about
parallelizing loading data. The thing that happens every day is
backups. I care rather a lot about optimizing that; we do backups
each and every day, and optimizations to that process will accrue
benefits each and every day.

In contrast, restoring databases does not take place every day. When
it happens, yes, there's considerable value to making *that* go as
quickly as possible, but I'm quite willing to consider optimizing that
to be separate from optimizing backups.

I daresay I haven't used pg_restore any time recently, either. Any
time we have thought about using it, we've concluded that the
perceivable benefits were actually more of a mirage.
--
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules of the Evil Overlord #145. "My dungeon cell decor will not
feature exposed pipes. While they add to the gloomy atmosphere, they
are good conductors of vibrations and a lot of prisoners know Morse
code." <http://www.eviloverlord.com/>

#52