Suggestion for --truncate-tables to pg_restore

Started by Karl O. Pincover 13 years ago18 messages
#1Karl O. Pinc
kop@meme.com
1 attachment(s)

Hi,

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Patch to the docs attached (before I go and do any
real coding.)

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

pg_restore.sgml.difftext/x-patch; charset=us-ascii; name=pg_restore.sgml.diffDownload
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..94d359d 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,26 @@
      </varlistentry>
 
      <varlistentry>
+      <term><option>-u</></term>
+      <term><option>--truncate-tables</></term>
+      <listitem>
+       <para>
+        This option is only relevant when performing a data-only
+        restore.  It instructs <application>pg_restore</application>
+        to execute commands to truncate the target tables while the
+        data is reloaded.  Use this when restoring tables or schemas
+        and <option>--clean</clean> cannot be used because dependent
+        objects would be destroyed.
+       </para>
+
+       <para>
+         The <option>--disable-triggers</option> will almost always
+         always need to be used in conjunction with this option to
+         disable check constraints on foreign keys.
+       </para>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--use-set-session-authorization</option></term>
       <listitem>
        <para>

#2Karl O. Pinc
kop@meme.com
In reply to: Karl O. Pinc (#1)
Re: Suggestion for --truncate-tables to pg_restore

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Before doing any more work I want to report on the
discussions that took place at the code sprint at
Postgres Open in Chicago. Because I'm going to add
in additional thoughts I've had and to avoid mis-representing
anybody's opinion I'll not mention who said what.
Feel free to step forward and claim Ingenious Ideas
as your own. Likewise I apologize if lack of attribution
makes it more difficult to discern (my) uninformed drivel
from intelligent insight.

----

First, the problem:

Begin with the following structure:

CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);

CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;

Then, by accident, somebody does:

UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;

So, you want to restore the data into schemaA.foo.
But schemaA.foo has (bad) data in it that must first
be removed. It would seem that using

pg_restore --clean -n schemaA -t foo my_pg_dump_backup

would solve the problem, it would drop schemaA.foo,
recreate it, and then restore the data. But this does
not work. schemaA.foo does not drop because it's
got a dependent database object, schemaB.bar.

Of course there are manual work-arounds. One of these
is truncating schemaA.foo and then doing a pg_restore
with --data-only. The manual work-arounds become
increasingly burdensome as you need to restore more
tables. The case that motivated me was an attempt
to restore the data in an entire schema, one which
contained a significant number of tables.

So, the idea here is to be able to do a data-only
restore, first truncating the data in the tables
being restored to remove the existing corrupted data.

The proposal is to add a --truncate-tables option
to pg_restore.

----

There were some comments on syntax.

I proposed to use -u as a short option. This was
thought confusing, given it's use in other
Unix command line programs (mysql). Since there's
no obvious short option, forget it. Just have
a long option.

Another choice is to avoid introducing yet another
option and instead overload --clean so that when
doing a --data-only restore --clean truncates tables
and otherwise --clean retains the existing behavior of
dropping and re-creating the restored objects.

(I tested pg_restore with 9.1 and when --data-only is
used --clean is ignored, it does not even produce a warning.
This is arguably a bug.)

----

More serious objections were raised regarding semantics.

What if, instead, the initial structure looked like:

CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);

CREATE TABLE schemaB.bar
(id INT CONSTRAINT "bar_on_foo" REFERENCES foo
, moredata INT);

With a case like this, in most real-world situations, you'd
have to use pg_restore with --disable-triggers if you wanted
to use --data-only and --truncate-tables. The possibility of
foreign key referential integrity corruption is obvious.

Aside: Unless you're restoring databases in their entirety
the pg_restore --disable-triggers option makes it easy to
introduce foreign key referential integrity corruption.
In fact, since pg_restore does not wrap it's operations
in one big transaction, it's easy to attempt restoration
of a portion of a database, have part of the process
succeed and part of it fail (due to either schema
or data dependencies), and be left off worse
than before you started. The pg_restore docs might
benefit from a big fat warning regarding
attempts to restore less than an entire database.

So, the discussion went, pg_restore is just another
application and introducing more options
which could lead to corruption of referential integrity is
a bad idea.

But pg_restore should not be thought of as just another
front-end. It should be thought of as a data recovery
tool. Recovering some data and being left with referential
integrity problems is better than having no data. This
is true even if, due to different users owning different
schemas and so forth, nobody knows exactly what
might be broken.

Yes, but we can do better. (The unstated sub-text being that
we don't want to introduce an inferior feature which
will then need to be supported forever.)

How could we do better:

Here I will record only the ideas related to restore,
although there was some mention of dump as well.

There has apparently been some discussion of writing
a foreign data wrapper which would operate on a database
dump. This might (in ways that are not immediately
obvious to me) address this issue.

The restore process could, based on what table data needs
restoration, look at foreign key dependencies and produce a
list of the tables which all must be restored into order to
ensure foreign key referential integrity. In the case of
restoration into a empty database the foreign key
dependences must be calculated from the dump. (An
"easy" way to do this would be to create
all the database objects in some temporary place and query
the system catalogs to produce the dependency graph.)
In the case of restoration into an
existing database the foreign key dependences should
come from the database into which the data is to be restored.
(This is necessary to capture dependences which may have
been introduced after the dump was taken.)

The above applies to data-only restoration. When restoring the
database schema meta-information (object definition) a similar
graph of database object dependences must be produced and used
to determine what needs to be restored.

But when doing a partial data-only restore there is more
to data integrity than just foreign key referential integrity.
Other constraints and triggers ensure other sorts of
data integrity rules. It is not enough to leave
triggers turned on when restoring data. Data not
restored may validate against restored data in triggers
fired only on manipulation of the un-restored table content.
The only solution I can see is to also include in the
computed set of tables which require restoration those
tables having triggers which reference any of the restored
data.

Just how far should pg_restore go in attempting to
preserve data integrity?

----

Two things are clear:

The current table and schema oriented options for backing
up and restoring portions of databases are flawed with
respect to data integrity.

Life is complicated.

Where should I go from here? I am not now in a position to
pursue anything more complicated than completing the code to
add a --truncate-tables option to pg_restore. Should I
finish this and send in a patch?

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#3Karl O. Pinc
kop@meme.com
In reply to: Karl O. Pinc (#2)
1 attachment(s)
Re: Suggestion for --truncate-tables to pg_restore

On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch. Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

pg_restore_truncate.patchtext/x-patch; charset=us-ascii; name=pg_restore_truncate.patchDownload
#4Karl O. Pinc
kop@meme.com
In reply to: Karl O. Pinc (#3)
1 attachment(s)
Re: Suggestion for --truncate-tables to pg_restore

Whoops. Do over. Sent the wrong file.

On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:

On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch. Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

pg_restore_truncate.patchtext/x-patch; charset=us-ascii; name=pg_restore_truncate.patchDownload
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..11cba8e 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,25 @@
      </varlistentry>
 
      <varlistentry>
+      <term><option>--truncate-tables</></term>
+      <listitem>
+       <para>
+        This option is only relevant when performing a data-only
+        restore.  It instructs <application>pg_restore</application>
+        to execute commands to truncate the target tables while the
+        data is reloaded.  Use this when restoring tables or schemas
+        and <option>--clean</clean> cannot be used because dependent
+        objects would be destroyed.
+       </para>
+
+       <para>
+         The <option>--disable-triggers</option> will almost always
+         always need to be used in conjunction with this option to
+         disable check constraints on foreign keys.
+       </para>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--use-set-session-authorization</option></term>
       <listitem>
        <para>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 										 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+										 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
 								 * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 722b3e9..43b5806 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -311,6 +311,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt->createDB && ropt->dropSchema)
 		exit_horribly(modulename, "-C and -c are incompatible options\n");
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt->dropSchema && ropt->truncate_tables)
+		exit_horribly(modulename, "-c and --truncate-tables are incompatible options\n");
+
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
@@ -412,6 +417,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt->dataOnly && ropt->truncate_tables)
+		exit_horribly(modulename, "--truncate-tables requires the --data-only option\n");
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -553,6 +562,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH->currentTE = te;
 
@@ -687,15 +697,22 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(AH);
+						truncate = 1;
+					} else
+						/* Truncate the table when asked to. */
+						truncate = ropt->truncate_tables;
 
+					if (truncate) {
 						/*
 						 * If the server version is >= 8.4, make sure we issue
 						 * TRUNCATE with ONLY so that child tables are not
-						 * wiped.
+						 * wiped.  If we don't know the server version
+						 * then err on the side of safety.
 						 */
 						ahprintf(AH, "TRUNCATE TABLE %s%s;\n\n",
-								 (PQserverVersion(AH->connection) >= 80400 ?
-								  "ONLY " : ""),
+								 (!AH->connection
+								  || PQserverVersion(AH->connection)
+									 >= 80400 ? "ONLY " : ""),
 								 fmtId(te->tag));
 					}
 
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f6c835b..c0b0bfc 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -77,6 +77,7 @@ main(int argc, char **argv)
 	static int	disable_triggers = 0;
 	static int	no_data_for_failed_tables = 0;
 	static int	outputNoTablespaces = 0;
+	static int	truncate_tables = 0;
 	static int	use_setsessauth = 0;
 	static int	no_security_labels = 0;
 
@@ -119,6 +120,7 @@ main(int argc, char **argv)
 		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
 		{"role", required_argument, NULL, 2},
 		{"section", required_argument, NULL, 3},
+		{"truncate-tables", no_argument, &truncate_tables, 1},
 		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 		{"no-security-labels", no_argument, &no_security_labels, 1},
 
@@ -324,6 +326,7 @@ main(int argc, char **argv)
 	opts->disable_triggers = disable_triggers;
 	opts->noDataForFailedTables = no_data_for_failed_tables;
 	opts->noTablespace = outputNoTablespaces;
+	opts->truncate_tables = truncate_tables;
 	opts->use_setsessauth = use_setsessauth;
 	opts->no_security_labels = no_security_labels;
 
@@ -434,6 +437,7 @@ usage(const char *progname)
 	printf(_("  --no-security-labels         do not restore security labels\n"));
 	printf(_("  --no-tablespaces             do not restore tablespace assignments\n"));
 	printf(_("  --section=SECTION            restore named section (pre-data, data, or post-data)\n"));
+	printf(_("  --truncate-tables            truncate tables before restore\n"));
 	printf(_("  --use-set-session-authorization\n"
 			 "                               use SET SESSION AUTHORIZATION commands instead of\n"
 			 "                               ALTER OWNER commands to set ownership\n"));

#5Karl O. Pinc
kop@meme.com
In reply to: Karl O. Pinc (#4)
1 attachment(s)
Re: Suggestion for --truncate-tables to pg_restore

Attached is version 2. The sgml did not build.

On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:

Whoops. Do over. Sent the wrong file.

On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:

On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch. Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

------quoted attachment------

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

pg_restore_truncate_v2.patchtext/x-patch; charset=us-ascii; name=pg_restore_truncate_v2.patchDownload
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..488d8dc 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,26 @@
      </varlistentry>
 
      <varlistentry>
+      <term><option>--truncate-tables</></term>
+      <listitem>
+       <para>
+        This option is only relevant when performing a data-only
+        restore.  It instructs <application>pg_restore</application>
+        to execute commands to truncate the target tables while the
+        data is reloaded.  Use this when restoring tables or schemas
+        and <option>--clean</option> cannot be used because dependent
+        objects would be destroyed.
+       </para>
+
+       <para>
+         The <option>--disable-triggers</option> will almost always
+         always need to be used in conjunction with this option to
+         disable check constraints on foreign keys.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--use-set-session-authorization</option></term>
       <listitem>
        <para>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 										 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+										 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
 								 * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 722b3e9..43b5806 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -311,6 +311,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt->createDB && ropt->dropSchema)
 		exit_horribly(modulename, "-C and -c are incompatible options\n");
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt->dropSchema && ropt->truncate_tables)
+		exit_horribly(modulename, "-c and --truncate-tables are incompatible options\n");
+
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
@@ -412,6 +417,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt->dataOnly && ropt->truncate_tables)
+		exit_horribly(modulename, "--truncate-tables requires the --data-only option\n");
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -553,6 +562,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH->currentTE = te;
 
@@ -687,15 +697,22 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(AH);
+						truncate = 1;
+					} else
+						/* Truncate the table when asked to. */
+						truncate = ropt->truncate_tables;
 
+					if (truncate) {
 						/*
 						 * If the server version is >= 8.4, make sure we issue
 						 * TRUNCATE with ONLY so that child tables are not
-						 * wiped.
+						 * wiped.  If we don't know the server version
+						 * then err on the side of safety.
 						 */
 						ahprintf(AH, "TRUNCATE TABLE %s%s;\n\n",
-								 (PQserverVersion(AH->connection) >= 80400 ?
-								  "ONLY " : ""),
+								 (!AH->connection
+								  || PQserverVersion(AH->connection)
+									 >= 80400 ? "ONLY " : ""),
 								 fmtId(te->tag));
 					}
 
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f6c835b..c0b0bfc 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -77,6 +77,7 @@ main(int argc, char **argv)
 	static int	disable_triggers = 0;
 	static int	no_data_for_failed_tables = 0;
 	static int	outputNoTablespaces = 0;
+	static int	truncate_tables = 0;
 	static int	use_setsessauth = 0;
 	static int	no_security_labels = 0;
 
@@ -119,6 +120,7 @@ main(int argc, char **argv)
 		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
 		{"role", required_argument, NULL, 2},
 		{"section", required_argument, NULL, 3},
+		{"truncate-tables", no_argument, &truncate_tables, 1},
 		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 		{"no-security-labels", no_argument, &no_security_labels, 1},
 
@@ -324,6 +326,7 @@ main(int argc, char **argv)
 	opts->disable_triggers = disable_triggers;
 	opts->noDataForFailedTables = no_data_for_failed_tables;
 	opts->noTablespace = outputNoTablespaces;
+	opts->truncate_tables = truncate_tables;
 	opts->use_setsessauth = use_setsessauth;
 	opts->no_security_labels = no_security_labels;
 
@@ -434,6 +437,7 @@ usage(const char *progname)
 	printf(_("  --no-security-labels         do not restore security labels\n"));
 	printf(_("  --no-tablespaces             do not restore tablespace assignments\n"));
 	printf(_("  --section=SECTION            restore named section (pre-data, data, or post-data)\n"));
+	printf(_("  --truncate-tables            truncate tables before restore\n"));
 	printf(_("  --use-set-session-authorization\n"
 			 "                               use SET SESSION AUTHORIZATION commands instead of\n"
 			 "                               ALTER OWNER commands to set ownership\n"));

#6Karl O. Pinc
kop@meme.com
In reply to: Karl O. Pinc (#5)
1 attachment(s)
Re: Suggestion for --truncate-tables to pg_restore

Hi,

Attached is version 3.

The convention seems to be to leave the operator at the
end of the line when breaking long lines, so do that.
Add extra () -- make operator precedence explicit and
have indentation reflect operator precedence.

On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote:

On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:

On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:

On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch. Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

pg_restore_truncate_v3.patchtext/x-patch; charset=us-ascii; name=pg_restore_truncate_v3.patchDownload
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b276da6..488d8dc 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -539,6 +539,26 @@
      </varlistentry>
 
      <varlistentry>
+      <term><option>--truncate-tables</></term>
+      <listitem>
+       <para>
+        This option is only relevant when performing a data-only
+        restore.  It instructs <application>pg_restore</application>
+        to execute commands to truncate the target tables while the
+        data is reloaded.  Use this when restoring tables or schemas
+        and <option>--clean</option> cannot be used because dependent
+        objects would be destroyed.
+       </para>
+
+       <para>
+         The <option>--disable-triggers</option> will almost always
+         always need to be used in conjunction with this option to
+         disable check constraints on foreign keys.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--use-set-session-authorization</option></term>
       <listitem>
        <para>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 										 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+										 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
 								 * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 722b3e9..43b5806 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -311,6 +311,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt->createDB && ropt->dropSchema)
 		exit_horribly(modulename, "-C and -c are incompatible options\n");
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt->dropSchema && ropt->truncate_tables)
+		exit_horribly(modulename, "-c and --truncate-tables are incompatible options\n");
+
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
@@ -412,6 +417,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt->dataOnly && ropt->truncate_tables)
+		exit_horribly(modulename, "--truncate-tables requires the --data-only option\n");
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -553,6 +562,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH->currentTE = te;
 
@@ -687,15 +697,22 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(AH);
+						truncate = 1;
+					} else
+						/* Truncate the table when asked to. */
+						truncate = ropt->truncate_tables;
 
+					if (truncate) {
 						/*
 						 * If the server version is >= 8.4, make sure we issue
 						 * TRUNCATE with ONLY so that child tables are not
-						 * wiped.
+						 * wiped.  If we don't know the server version
+						 * then err on the side of safety.
 						 */
 						ahprintf(AH, "TRUNCATE TABLE %s%s;\n\n",
-								 (PQserverVersion(AH->connection) >= 80400 ?
-								  "ONLY " : ""),
+								 ((!AH->connection ||
+								   PQserverVersion(AH->connection) >= 80400) ?
+								  "ONLY " : ""),
 								 fmtId(te->tag));
 					}
 
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f6c835b..c0b0bfc 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -77,6 +77,7 @@ main(int argc, char **argv)
 	static int	disable_triggers = 0;
 	static int	no_data_for_failed_tables = 0;
 	static int	outputNoTablespaces = 0;
+	static int	truncate_tables = 0;
 	static int	use_setsessauth = 0;
 	static int	no_security_labels = 0;
 
@@ -119,6 +120,7 @@ main(int argc, char **argv)
 		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
 		{"role", required_argument, NULL, 2},
 		{"section", required_argument, NULL, 3},
+		{"truncate-tables", no_argument, &truncate_tables, 1},
 		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 		{"no-security-labels", no_argument, &no_security_labels, 1},
 
@@ -324,6 +326,7 @@ main(int argc, char **argv)
 	opts->disable_triggers = disable_triggers;
 	opts->noDataForFailedTables = no_data_for_failed_tables;
 	opts->noTablespace = outputNoTablespaces;
+	opts->truncate_tables = truncate_tables;
 	opts->use_setsessauth = use_setsessauth;
 	opts->no_security_labels = no_security_labels;
 
@@ -434,6 +437,7 @@ usage(const char *progname)
 	printf(_("  --no-security-labels         do not restore security labels\n"));
 	printf(_("  --no-tablespaces             do not restore tablespace assignments\n"));
 	printf(_("  --section=SECTION            restore named section (pre-data, data, or post-data)\n"));
+	printf(_("  --truncate-tables            truncate tables before restore\n"));
 	printf(_("  --use-set-session-authorization\n"
 			 "                               use SET SESSION AUTHORIZATION commands instead of\n"
 			 "                               ALTER OWNER commands to set ownership\n"));


#7Karl O. Pinc
kop@meme.com
In reply to: Karl O. Pinc (#6)
1 attachment(s)
Re: Suggestion for --truncate-tables to pg_restore

Hi,

Attached is version 4. Version 3 no longer
built against head.

On 10/16/2012 09:48:06 PM, Karl O. Pinc wrote:

On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote:

On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:

On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:

On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch. Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

------quoted attachment------

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

pg_restore_truncate_v4.patchtext/x-patch; charset=us-ascii; name=pg_restore_truncate_v4.patchDownload
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 1b9db6b..23b0d16 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -546,6 +546,26 @@
      </varlistentry>
 
      <varlistentry>
+      <term><option>--truncate-tables</></term>
+      <listitem>
+       <para>
+        This option is only relevant when performing a data-only
+        restore.  It instructs <application>pg_restore</application>
+        to execute commands to truncate the target tables while the
+        data is reloaded.  Use this when restoring tables or schemas
+        and <option>--clean</option> cannot be used because dependent
+        objects would be destroyed.
+       </para>
+
+       <para>
+         The <option>--disable-triggers</option> will almost always
+         always need to be used in conjunction with this option to
+         disable check constraints on foreign keys.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--use-set-session-authorization</option></term>
       <listitem>
        <para>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3b49395..0aaf1d3 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -101,6 +101,8 @@ typedef struct _restoreOptions
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;		/* disable triggers during data-only
 										 * restore */
+	int			truncate_tables;		/* truncate tables during data-only
+										 * restore */
 	int			use_setsessauth;/* Use SET SESSION AUTHORIZATION commands
 								 * instead of OWNER TO */
 	int			no_security_labels;		/* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1fead28..c7fdc79 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -309,6 +309,11 @@ RestoreArchive(Archive *AHX)
 	if (ropt->createDB && ropt->single_txn)
 		exit_horribly(modulename, "-C and -1 are incompatible options\n");
 
+	/* When the schema is dropped and re-created then no point
+	 * truncating tables. */
+	if (ropt->dropSchema && ropt->truncate_tables)
+		exit_horribly(modulename, "-c and --truncate-tables are incompatible options\n");
+
 	/*
 	 * If we're going to do parallel restore, there are some restrictions.
 	 */
@@ -403,6 +408,10 @@ RestoreArchive(Archive *AHX)
 		}
 	}
 
+	/* Truncate tables only when restoring data. */
+	if (!ropt->dataOnly && ropt->truncate_tables)
+		exit_horribly(modulename, "--truncate-tables requires the --data-only option\n");
+
 	/*
 	 * Setup the output file if necessary.
 	 */
@@ -562,6 +571,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	int			retval = 0;
 	teReqs		reqs;
 	bool		defnDumped;
+	bool		truncate;
 
 	AH->currentTE = te;
 
@@ -696,14 +706,21 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(AH);
+						truncate = 1;
+					} else
+						/* Truncate the table when asked to. */
+						truncate = ropt->truncate_tables;
 
+					if (truncate) {
 						/*
 						 * If the server version is >= 8.4, make sure we issue
 						 * TRUNCATE with ONLY so that child tables are not
-						 * wiped.
+						 * wiped.  If we don't know the server version
+						 * then err on the side of safety.
 						 */
 						ahprintf(AH, "TRUNCATE TABLE %s%s;\n\n",
-								 (PQserverVersion(AH->connection) >= 80400 ?
+								 ((!AH->connection ||
+								   PQserverVersion(AH->connection) >= 80400) ?
 								  "ONLY " : ""),
 								 fmtId(te->tag));
 					}
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 49d799b..316fe63 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -77,6 +77,7 @@ main(int argc, char **argv)
 	static int	disable_triggers = 0;
 	static int	no_data_for_failed_tables = 0;
 	static int	outputNoTablespaces = 0;
+	static int	truncate_tables = 0;
 	static int	use_setsessauth = 0;
 	static int	no_security_labels = 0;
 
@@ -119,6 +120,7 @@ main(int argc, char **argv)
 		{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
 		{"role", required_argument, NULL, 2},
 		{"section", required_argument, NULL, 3},
+		{"truncate-tables", no_argument, &truncate_tables, 1},
 		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
 		{"no-security-labels", no_argument, &no_security_labels, 1},
 
@@ -324,6 +326,7 @@ main(int argc, char **argv)
 	opts->disable_triggers = disable_triggers;
 	opts->noDataForFailedTables = no_data_for_failed_tables;
 	opts->noTablespace = outputNoTablespaces;
+	opts->truncate_tables = truncate_tables;
 	opts->use_setsessauth = use_setsessauth;
 	opts->no_security_labels = no_security_labels;
 
@@ -434,6 +437,7 @@ usage(const char *progname)
 	printf(_("  --no-security-labels         do not restore security labels\n"));
 	printf(_("  --no-tablespaces             do not restore tablespace assignments\n"));
 	printf(_("  --section=SECTION            restore named section (pre-data, data, or post-data)\n"));
+	printf(_("  --truncate-tables            truncate tables before restore\n"));
 	printf(_("  --use-set-session-authorization\n"
 			 "                               use SET SESSION AUTHORIZATION commands instead of\n"
 			 "                               ALTER OWNER commands to set ownership\n"));

#8Josh Kupershmidt
schmiddy@gmail.com
In reply to: Karl O. Pinc (#2)
Re: Suggestion for --truncate-tables to pg_restore

Hi Karl,
I signed on to review this patch for the current CF. Most of the
background for the patch seems to be in the message below, so I'm
going to respond to this one first.

On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop@meme.com> wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

OK.

----

First, the problem:

Begin with the following structure:

CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);

CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;

Then, by accident, somebody does:

UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;

So, you want to restore the data into schemaA.foo.
But schemaA.foo has (bad) data in it that must first
be removed. It would seem that using

pg_restore --clean -n schemaA -t foo my_pg_dump_backup

would solve the problem, it would drop schemaA.foo,
recreate it, and then restore the data. But this does
not work. schemaA.foo does not drop because it's
got a dependent database object, schemaB.bar.

Right.

Of course there are manual work-arounds. One of these
is truncating schemaA.foo and then doing a pg_restore
with --data-only.

Simply doing TRUNCATE manually was the first thought that occurred to
me when I saw your example.

The manual work-arounds become
increasingly burdensome as you need to restore more
tables. The case that motivated me was an attempt
to restore the data in an entire schema, one which
contained a significant number of tables.

TBH, I didn't find the example above particularly compelling for
demonstrating the need for this feature. If you've just got one table
with dependent views which needs to be restored, it's pretty easy to
manually TRUNCATE and have pg_restore --data-only reload the table.
(And easy enough to combine the truncate and restore into a single
transaction in case anything goes wrong, if need be.)

But I'm willing to grant that this proposed feature is potentially as
useful as existing restore-jiggering options like --disable-triggers.
And I guess I could see that if you're really stuck having to perform
a --data-only restore of many tables, this feature could come in
handy.

So, the idea here is to be able to do a data-only
restore, first truncating the data in the tables
being restored to remove the existing corrupted data.

The proposal is to add a --truncate-tables option
to pg_restore.

----

There were some comments on syntax.

I proposed to use -u as a short option. This was
thought confusing, given it's use in other
Unix command line programs (mysql). Since there's
no obvious short option, forget it. Just have
a long option.

Agreed.

Another choice is to avoid introducing yet another
option and instead overload --clean so that when
doing a --data-only restore --clean truncates tables
and otherwise --clean retains the existing behavior of
dropping and re-creating the restored objects.

I like the --truncate-tables flag idea better than overloading
--clean, since it makes the purpose immediately apparent.

(I tested pg_restore with 9.1 and when --data-only is
used --clean is ignored, it does not even produce a warning.
This is arguably a bug.)

+1 for having pg_restore bail out with an error if the user specifies
--data-only --clean. By the same token, --clean and --truncate-tables
together should also raise a "not allowed" error.

----

More serious objections were raised regarding semantics.

What if, instead, the initial structure looked like:

CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);

CREATE TABLE schemaB.bar
(id INT CONSTRAINT "bar_on_foo" REFERENCES foo
, moredata INT);

With a case like this, in most real-world situations, you'd
have to use pg_restore with --disable-triggers if you wanted
to use --data-only and --truncate-tables. The possibility of
foreign key referential integrity corruption is obvious.

Why would --disable-triggers be used in this example? I don't think
you could use --truncate-tables to restore only table "foo", because
you would get:

ERROR: cannot truncate a table referenced in a foreign key constraint

(At least, I think so, not having tried with the actual patch.) You
could use --disable-triggers when restoring "bar". Though if you're
just enabling that option for performance purposes, and are unable to
guarantee that the restore will leave the tables in a consistent
state, well then it seems like you shouldn't use the option.

Aside: Unless you're restoring databases in their entirety
the pg_restore --disable-triggers option makes it easy to
introduce foreign key referential integrity corruption.

Yup, and I think the docs could do more to warn users about
--disable-triggers in particular. And I see you've submitted a
separate patch along those lines.

In fact, since pg_restore does not wrap it's operations
in one big transaction, it's easy to attempt restoration
of a portion of a database, have part of the process
succeed and part of it fail (due to either schema
or data dependencies), and be left off worse
than before you started.

That's what the --single-transaction option is for.

So, the discussion went, pg_restore is just another
application and introducing more options
which could lead to corruption of referential integrity is
a bad idea.

I do agree that increasing the ways for pg_restore to be a foot-gun is
a Bad Idea.

But pg_restore should not be thought of as just another
front-end. It should be thought of as a data recovery
tool.

I don't totally agree that charter for pg_restore should be a "data
recovery tool" (i.e. general purpose), but for the sake of this patch
we can leave that aside.

Recovering some data and being left with referential
integrity problems is better than having no data.

Well, this is really a judgment call, and I have a hunch that many
admins would actually choose "none of the above". And I think this
point gets to the crux of whether the --truncate-tables option will be
useful as-is.

For your first example, --truncate-tables seems to have some use, so
that the admin isn't forced to recreate various views which may be
dependent on the table. (Though it's not too difficult to work around
this case today.)

For the second example involving FKs, I'm a little bit more hesitant
about endorsing the use of --truncate-tables combined with
--disable-triggers to potentially allow bogus FKs. I know this is
possible to some extent today using the --disable-triggers option, but
it makes me nervous to be adding a mode to pg_restore if it's
primarily designed to work together with --disable-triggers as a
larger foot-gun.

This
is true even if, due to different users owning different
schemas and so forth, nobody knows exactly what
might be broken.

Yes, but we can do better. (The unstated sub-text being that
we don't want to introduce an inferior feature which
will then need to be supported forever.)

How could we do better:

Here I will record only the ideas related to restore,
although there was some mention of dump as well.

There has apparently been some discussion of writing
a foreign data wrapper which would operate on a database
dump. This might (in ways that are not immediately
obvious to me) address this issue.

That's an interesting idea. If you could SELECT directly out of a dump
file via FDW, you could handle the restore process purely in SQL. But
not directly relevant to this patch.

The restore process could, based on what table data needs
restoration, look at foreign key dependencies and produce a
list of the tables which all must be restored into order to
ensure foreign key referential integrity.

One mode of operation of pg_restore is outputting to a file or pipe
for subsequent processing, which of course wouldn't work with this
idea of having the restore be dependent on the target database
structure.

[snip more discussion of pg_restore possibly reordering objects and ensuring integrity]

For the purposes of actually completing a review of the patch in
question, I'm going to avoid further blue-sky speculation here.

Just a few initial comments about the doc portion of the patch:

+ This option is only relevant when performing a data-only

If we are going to restrict the --truncate-tables option to be used
with --data-only, "only allowed" may be more clear than "only
relevant".

+       <para>
+         The <option>--disable-triggers</option> will almost always
+         always need to be used in conjunction with this option to
+         disable check constraints on foreign keys.
+       </para>

For the first example you posted, of a view dependent on a table which
needed to be restored, this advice would not be accurate. IMO it's a
little dangerous advising users to "almost always" use a foot-gun like
--disable-triggers.

I'm out of time for today, and I haven't had a chance to actually try
out the patch, but I wanted to send off my thoughts so far. I should
have a chance for another look later this week.

Josh

#9Karl O. Pinc
kop@meme.com
In reply to: Josh Kupershmidt (#8)
2 attachment(s)
Re: Suggestion for --truncate-tables to pg_restore

Hi Josh,

On 11/20/2012 11:53:23 PM, Josh Kupershmidt wrote:

Hi Karl,
I signed on to review this patch for the current CF.

I noticed. Thanks very much.

On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop@meme.com> wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

First, the problem:

Begin with the following structure:

CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);

CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;

Then, by accident, somebody does:

UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;

So, you want to restore the data into schemaA.foo.
But schemaA.foo has (bad) data in it that must first
be removed. It would seem that using

pg_restore --clean -n schemaA -t foo my_pg_dump_backup

would solve the problem, it would drop schemaA.foo,
recreate it, and then restore the data. But this does
not work. schemaA.foo does not drop because it's
got a dependent database object, schemaB.bar.

TBH, I didn't find the example above particularly compelling for
demonstrating the need for this feature. If you've just got one table
with dependent views which needs to be restored, it's pretty easy to
manually TRUNCATE and have pg_restore --data-only reload the table.
(And easy enough to combine the truncate and restore into a single
transaction in case anything goes wrong, if need be.)

I was not clear in stating the problem. (But see below
regarding foreign keys.) The dependent view
was an example, but there can also be REFERENCES constraints and
trigger related constraints involving other tables in other schemas.
The easiest work-around I can think of here is destroying all the
triggers and constraints, either everything in the whole db
or doing some work to be more selective, truncating all the schema's
tables. doing a data-only restore of the
schema, and then pg_restore --data-only, and then re-creating
all the triggers and constraints.

But I'm willing to grant that this proposed feature is potentially as
useful as existing restore-jiggering options like --disable-triggers.
And I guess I could see that if you're really stuck having to perform
a --data-only restore of many tables, this feature could come in
handy.

I think so. See above.

So, the idea here is to be able to do a data-only
restore, first truncating the data in the tables
being restored to remove the existing corrupted data.

The proposal is to add a --truncate-tables option
to pg_restore.

----

(I tested pg_restore with 9.1 and when --data-only is
used --clean is ignored, it does not even produce a warning.
This is arguably a bug.)

+1 for having pg_restore bail out with an error if the user specifies
--data-only --clean. By the same token, --clean and --truncate-tables
together should also raise a "not allowed" error.

OT:
After looking at the code I found a number of "conflicting"
option combinations are not tested for or rejected. I can't
recall what they are now. The right way to fix these would be
to send in a separate patch for each "conflict" all attached
to one email/commitfest item? Or one patch that just gets
adjusted until it's right?

----

More serious objections were raised regarding semantics.

What if, instead, the initial structure looked like:

CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);

CREATE TABLE schemaB.bar
(id INT CONSTRAINT "bar_on_foo" REFERENCES foo
, moredata INT);

With a case like this, in most real-world situations, you'd
have to use pg_restore with --disable-triggers if you wanted
to use --data-only and --truncate-tables. The possibility of
foreign key referential integrity corruption is obvious.

Why would --disable-triggers be used in this example? I don't think
you could use --truncate-tables to restore only table "foo", because
you would get:

ERROR: cannot truncate a table referenced in a foreign key
constraint

(At least, I think so, not having tried with the actual patch.) You
could use --disable-triggers when restoring "bar".

I tried it and you're quite right. (I thought I'd tried this
before, but clearly I did not -- proving the utility of the review
process. :-) My assumption was that since triggers
were turned off that constraints, being triggers, would be off as
well and therefore tables with foreign keys could be truncated.
Obviously not, since the I get the above error.

I just tried it. --disable-triggers does not disable constraints.

Recovering some data and being left with referential
integrity problems is better than having no data.

Well, this is really a judgment call, and I have a hunch that many
admins would actually choose "none of the above". And I think this
point gets to the crux of whether the --truncate-tables option will
be
useful as-is.

Well, yes. "None of the above" is best. :) In my case I had munged
the data in the one important schema and wanted to restore. The
setting is an academic one and a lot of cruft gets left laying
around in the other schemas, some of which consist entirely of cruft.
Responsibility for the non-main schema content is distributed.

In the interest of getting things working again quickly I wished to
restore the important schema quickly, and didn't want to have to sort
through the cruft ahead of time. In other words, I was entirely
willing to break things and pick up the pieces afterwords.

For your first example, --truncate-tables seems to have some use, so
that the admin isn't forced to recreate various views which may be
dependent on the table. (Though it's not too difficult to work around
this case today.)

As an aside: I never have an issue with this, having planned ahead.
I'm curious what the not-too-difficult work-around is that you have
in mind. I'm not coming up with a tidy way to, e.g, pull a lot
of views out of a dump.

For the second example involving FKs, I'm a little bit more hesitant
about endorsing the use of --truncate-tables combined with
--disable-triggers to potentially allow bogus FKs. I know this is
possible to some extent today using the --disable-triggers option,
but
it makes me nervous to be adding a mode to pg_restore if it's
primarily designed to work together with --disable-triggers as a
larger foot-gun.

This is the crux of the issue, and where I was thinking of
taking this patch. I very often am of the mindset that
foreign keys are no more or less special than other
more complex data integrity rules enforced with triggers.
(I suppose others might say the same regards to integrity
enforced at the application level.) This naturally
inclines me to think that one more way, beyond
--disable-triggers, to break integrity is no big deal.

But I quite see your point. Is it possible to get
resolution on this issue before either of us do any
more work in the direction of foreign keys?

An additional foot-gun, --disable-constraints,
seems like the natural progression in this direction.
Constraints, unlike triggers (?), can, in theory,
be fired at any time to check data content so perhaps
providing a way to test existing constraints against
db content would be a way to mitigate the foot-gun-ness
and drive an after-the-fact data cleanup process.

--disable-constraints seems like an entirely separate
patch so maybe we can stop the FK related issue right
here? (Although I would appreciate feedback regards
whether such an option might be accepted, at minimum
I'd like to get this out of my brain.)

Just a few initial comments about the doc portion of the patch:

+ This option is only relevant when performing a data-only

If we are going to restrict the --truncate-tables option to be used
with --data-only, "only allowed" may be more clear than "only
relevant".

Make sense to me. My intent here was to use the language
used elsewhere in the docs but I'm happy to go in a better direction.
(And perhaps later submit more patches to move other parts of the docs
in that direction.)

+       <para>
+         The <option>--disable-triggers</option> will almost always
+         always need to be used in conjunction with this option to
+         disable check constraints on foreign keys.
+       </para>

For the first example you posted, of a view dependent on a table
which
needed to be restored, this advice would not be accurate. IMO it's a
little dangerous advising users to "almost always" use a foot-gun
like
--disable-triggers.

Sorta brings us back to the sticking point above....
And sounds like, at the moment at least, this paragraph can
be deleted.

I'm out of time for today, and I haven't had a chance to actually try
out the patch, but I wanted to send off my thoughts so far. I should
have a chance for another look later this week.

Thanks for the work.

I'll hold off on submitting any revisions to the patch for the moment.

One thing you'll want to pay attention to is the point
in the restore process at which the truncation is done.
In the current version each table is truncated immediately
before being copied. It might or might not be better to do
all the truncation up-front, in the fashion of --clean.
I would appreciate some guidance on this.

In case it's helpful I'm attaching two files
I used to test the foreign key issue.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

pg_restore_test.sqltext/x-sql; charset=us-ascii; name=pg_restore_test.sqlDownload
pg_restore_test.shapplication/x-shellscript; name=pg_restore_test.shDownload
#10Josh Kupershmidt
schmiddy@gmail.com
In reply to: Karl O. Pinc (#9)
Re: Suggestion for --truncate-tables to pg_restore

On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc <kop@meme.com> wrote:

On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop@meme.com> wrote:

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

OT:
After looking at the code I found a number of "conflicting"
option combinations are not tested for or rejected. I can't
recall what they are now. The right way to fix these would be
to send in a separate patch for each "conflict" all attached
to one email/commitfest item? Or one patch that just gets
adjusted until it's right?

Typically I think it's easiest for the reviewer+committer to consider
a bunch of such similar changes altogether in one patch, rather than
in a handful of separate patches, though that's just my own
preference.

----

More serious objections were raised regarding semantics.

What if, instead, the initial structure looked like:

CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);

CREATE TABLE schemaB.bar
(id INT CONSTRAINT "bar_on_foo" REFERENCES foo
, moredata INT);

With a case like this, in most real-world situations, you'd
have to use pg_restore with --disable-triggers if you wanted
to use --data-only and --truncate-tables. The possibility of
foreign key referential integrity corruption is obvious.

Why would --disable-triggers be used in this example? I don't think
you could use --truncate-tables to restore only table "foo", because
you would get:

ERROR: cannot truncate a table referenced in a foreign key
constraint

(At least, I think so, not having tried with the actual patch.) You
could use --disable-triggers when restoring "bar".

I tried it and you're quite right. (I thought I'd tried this
before, but clearly I did not -- proving the utility of the review
process. :-) My assumption was that since triggers
were turned off that constraints, being triggers, would be off as
well and therefore tables with foreign keys could be truncated.
Obviously not, since the I get the above error.

I just tried it. --disable-triggers does not disable constraints.

Just to be clear, I believe the problem in this example is that
--disable-triggers does not disable any foreign key constraints of
other tables when you are restoring a single table. So with:

pg_restore -1 --truncate-tables --disable-triggers --table=foo \
--data-only my.dump ...

you would get the complaint

ERROR: cannot truncate a table referenced in a foreign key constraint

which is talking about bar's referencing foo, and there was no

ALTER TABLE bar DISABLE TRIGGER ALL

done, since "bar" isn't being restored. I don't have a quibble with
this existing behavior -- you are instructing pg_restore to only mess
with "bar", after all. See below, though, for a comparison of --clean
and --truncate-tables when restoring "foo" and "bar" together.

For your first example, --truncate-tables seems to have some use, so
that the admin isn't forced to recreate various views which may be
dependent on the table. (Though it's not too difficult to work around
this case today.)

As an aside: I never have an issue with this, having planned ahead.
I'm curious what the not-too-difficult work-around is that you have
in mind. I'm not coming up with a tidy way to, e.g, pull a lot
of views out of a dump.

Well, for the first example, there was one table and only one view
which depended on the table, so manually editing the list file like
so:

pg_restore --list --file=my.dump > output.list
# manually edit file "output.list", select only entries pertaining
# to the table and dependent view
pg_restore -1 --clean --use-list=output.list ...

isn't too arduous, though it would become so if you had more dependent
views to worry about.

I'm willing to count this use-case as a usability win for
--truncate-tables, since with that option the restore can be boiled
down to a short and sweet:

pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ...

Though note this may not prove practical for large tables, since
--data-only leaves constraints and indexes intact during the restore,
and can be massively slower compared to adding the constraints only
after the data has been COPYed in, as pg_restore otherwise does.

For the second example involving FKs, I'm a little bit more hesitant
about endorsing the use of --truncate-tables combined with
--disable-triggers to potentially allow bogus FKs. I know this is
possible to some extent today using the --disable-triggers option,
but
it makes me nervous to be adding a mode to pg_restore if it's
primarily designed to work together with --disable-triggers as a
larger foot-gun.

This is the crux of the issue, and where I was thinking of
taking this patch. I very often am of the mindset that
foreign keys are no more or less special than other
more complex data integrity rules enforced with triggers.
(I suppose others might say the same regards to integrity
enforced at the application level.) This naturally
inclines me to think that one more way, beyond
--disable-triggers, to break integrity is no big deal.

But I quite see your point. Is it possible to get
resolution on this issue before either of us do any
more work in the direction of foreign keys?

I think the patch has some promise with at least one use-case (view(s)
dependent on table which needs to be reloaded, as discussed above).
With the other use-case we have been discussing, of reloading a table
referenced by other table(s)'s FKs, whether --truncate-tables is
helpful is less clear to me, at least in the patch's current state.
(See also bottom.)

An additional foot-gun, --disable-constraints,
seems like the natural progression in this direction.
Constraints, unlike triggers (?), can, in theory,
be fired at any time to check data content so perhaps
providing a way to test existing constraints against
db content would be a way to mitigate the foot-gun-ness
and drive an after-the-fact data cleanup process.

--disable-constraints seems like an entirely separate
patch so maybe we can stop the FK related issue right
here? (Although I would appreciate feedback regards
whether such an option might be accepted, at minimum
I'd like to get this out of my brain.)

I'm not sure I follow exactly how you envision --disable-constraints
would work, but it does seem a separate issue from the
--truncate-tables option at hand.

One thing you'll want to pay attention to is the point
in the restore process at which the truncation is done.
In the current version each table is truncated immediately
before being copied. It might or might not be better to do
all the truncation up-front, in the fashion of --clean.
I would appreciate some guidance on this.

IMO --truncate-tables is, at a minimum, going to need to truncate
tables up-front and in the appropriate order to avoid the annoying
"ERROR: cannot truncate a table referenced in a foreign key
constraint", at least when avoiding that error is possible.

Let's go back to your example of two tables, with "bar" referencing
"foo". This existing --clean functionality works fine:

# edit list to only include lines mentioning "foo" and "bar"
pg_restore -1 --clean --use-list=output.list ...

But this won't work, since pg_restore attempts to truncate and restore
foo separately from bar:

# edit list to only include lines mentioning "foo" and "bar"
pg_restore --truncate-tables --data-only -1 --use-list=output.list ...

i.e. will run into "ERROR: cannot truncate a table referenced in a
foreign key constraint".

In case it's helpful I'm attaching two files
I used to test the foreign key issue.

Thanks, I do appreciate seeing testcases scripts like this, since it
can neatly demonstrate the intended use-case of the feature, and help
bring to light anything that's missing. I tried your testcase, and
got:

pg_restore: [archiver (db)] could not execute query: ERROR: cannot
truncate a table referenced in a foreign key constraint
DETAIL: Table "bar" references "foo".

as discussed above. If you'd like to advertise this feature as being
handy for reloading a table referenced by other FKs, I'd be interested
to see a testcase demonstrating this use, along with any changes to
the patch (e.g. moving TRUNCATEs to the start) which might be needed.

Josh

#11Robert Haas
robertmhaas@gmail.com
In reply to: Josh Kupershmidt (#8)
Re: Suggestion for --truncate-tables to pg_restore

On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt <schmiddy@gmail.com> wrote:

TBH, I didn't find the example above particularly compelling for
demonstrating the need for this feature. If you've just got one table
with dependent views which needs to be restored, it's pretty easy to
manually TRUNCATE and have pg_restore --data-only reload the table.
(And easy enough to combine the truncate and restore into a single
transaction in case anything goes wrong, if need be.)

But I'm willing to grant that this proposed feature is potentially as
useful as existing restore-jiggering options like --disable-triggers.
And I guess I could see that if you're really stuck having to perform
a --data-only restore of many tables, this feature could come in
handy.

I think I would come down on the other side of this. We've never
really been able to get --clean work properly in all scenarios, and it
seems likely that a similar fate will befall this option.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Karl O. Pinc
kop@meme.com
In reply to: Robert Haas (#11)
Re: Suggestion for --truncate-tables to pg_restore

On 11/26/2012 12:06:56 PM, Robert Haas wrote:

On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt
<schmiddy@gmail.com> wrote:

TBH, I didn't find the example above particularly compelling for
demonstrating the need for this feature. If you've just got one

table

with dependent views which needs to be restored, it's pretty easy

to

manually TRUNCATE and have pg_restore --data-only reload the table.
(And easy enough to combine the truncate and restore into a single
transaction in case anything goes wrong, if need be.)

But I'm willing to grant that this proposed feature is potentially

as

useful as existing restore-jiggering options like

--disable-triggers.

And I guess I could see that if you're really stuck having to

perform

a --data-only restore of many tables, this feature could come in
handy.

I think I would come down on the other side of this. We've never
really been able to get --clean work properly in all scenarios, and
it
seems likely that a similar fate will befall this option.

Where I would like to go with this is to first introduce,
as a new patch, an ALTER TABLE option to disable a
constraint. Something like

ALTER TABLE foo UNVALIDATE CONSTRAINT "constraintname";

This would mark the constraint NOT VALID, as if the
constraint were created with the NOT VALID option.
After a constraint is UNVALIDATEd the existing

ALTER TABLE foo VALIDATE CONSTRAINT "constraintname";

feature would turn the constraint on and check the data.

With UNVALIDATE CONSTRAINT, pg_restore could first turn
off all the constraints concerning tables to be restored,
truncate the tables, restore the data, turn the
constraints back on and re-validate the constraints.
No need to worry about ordering based on a FK referential
dependency graph or loops in such a graph (due to
DEFERRABLE INITIALLY DEFERRED).

This approach would allow the content of a table or
tables to be restored regardless of dependent objects
or FK references and preserve FK referential integrity.
Right? I need some guidance here from someone who
knows more than I do.

There would likely need to be a pg_restore option like
--disable-constraints to invoke this functionality,
but that can be worked out later.
Likewise, I see an update and a delete trigger in
pg_triggers associated with the referenced table
in REFERENCES constraints, but no trigger for
truncate. So making a constraint NOT VALID may
not be as easy as it seems.

I don't know what the problems are with --clean
but I would like to know if this appears
a promising approach. If so I can pursue it,
although I make no promises. (I sent in
the --disable-triggers patch because it seemed
easy and I'm not sure where a larger project fits
into my life.)

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

P.S. An outstanding question regards --truncate-tables
is whether it should drop indexes before truncate
and re-create them after restore. Sounds like it should.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Karl O. Pinc (#12)
Re: Suggestion for --truncate-tables to pg_restore

On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote:

Where I would like to go with this is to first introduce,
as a new patch, an ALTER TABLE option to disable a
constraint. Something like

ALTER TABLE foo UNVALIDATE CONSTRAINT "constraintname";

This doesn't really make sense, because constraints that are not
validated are still enforced for new rows. This thus wouldn't save
anything performance-wise. We should perhaps have two more states:
not enforced but blindly assumed true, and not enforced and not
assumed true either. But currently, we don't.

I don't know what the problems are with --clean
but I would like to know if this appears
a promising approach. If so I can pursue it,
although I make no promises. (I sent in
the --disable-triggers patch because it seemed
easy and I'm not sure where a larger project fits
into my life.)

I'm not really sure what the issues were any more; but I think they
may have had to do with dependencies between different objects messing
things up, which I think is likely to be a problem for this proposal
as well.

P.S. An outstanding question regards --truncate-tables
is whether it should drop indexes before truncate
and re-create them after restore. Sounds like it should.

Well, that would improve performance, but it also makes the behavior
of object significantly different from what one might expect from the
name. One of the problems here is that there seem to be a number of
slightly-different things that one might want to do, and it's not
exactly clear what all of them are, or whether a reasonable number of
options can cater to all of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Josh Kupershmidt
schmiddy@gmail.com
In reply to: Robert Haas (#13)
Re: Suggestion for --truncate-tables to pg_restore

On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote:

P.S. An outstanding question regards --truncate-tables
is whether it should drop indexes before truncate
and re-create them after restore. Sounds like it should.

Well, that would improve performance, but it also makes the behavior
of object significantly different from what one might expect from the
name. One of the problems here is that there seem to be a number of
slightly-different things that one might want to do, and it's not
exactly clear what all of them are, or whether a reasonable number of
options can cater to all of them.

Another problem: attempting to drop a unique constraint or primary key
(if we're counting these as indexes to be dropped and recreated, which
they should be if the goal is reasonable restore performance) which is
referenced by another table's foreign key will cause:
ERROR: cannot drop constraint xxx on table yyy
because other objects depend on it

and as discussed upthread, it would be impolite for pg_restore to
presume it should monkey with dropping+recreating other tables'
constraints to work around this problem, not to mention impossible
when pg_restore is not connected to the target database.

It is a common administrative task to selectively restore some
existing tables' contents from a backup, and IIRC was the impetus for
this patch. Instead of adding a bunch of options to pg_restore,
perhaps a separate tool specific to this task would be the way to go.
It could handle the minutiae of truncating, dropping and recreating
constraints and indexes of the target tables, and dealing with FKs
sensibly, without worrying about conflicts with existing pg_restore
options and behavior.

Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Karl O. Pinc
kop@meme.com
In reply to: Josh Kupershmidt (#14)
Re: Suggestion for --truncate-tables to pg_restore

On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:

On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote:

P.S. An outstanding question regards --truncate-tables
is whether it should drop indexes before truncate
and re-create them after restore. Sounds like it should.

Well, that would improve performance, but it also makes the

behavior

of object significantly different from what one might expect from

the

name. One of the problems here is that there seem to be a number

of

slightly-different things that one might want to do, and it's not
exactly clear what all of them are, or whether a reasonable number

of

options can cater to all of them.

Another problem: attempting to drop a unique constraint or primary
key
(if we're counting these as indexes to be dropped and recreated,
which
they should be if the goal is reasonable restore performance) which
is
referenced by another table's foreign key will cause:
ERROR: cannot drop constraint xxx on table yyy
because other objects depend on it

and as discussed upthread, it would be impolite for pg_restore to
presume it should monkey with dropping+recreating other tables'
constraints to work around this problem, not to mention impossible
when pg_restore is not connected to the target database.

I'm thinking impossible because it's impossible to know
what the existing FKs are without a db connection. Impossible is
a problem. You may have another reason why it's impossible.

It is a common administrative task to selectively restore some
existing tables' contents from a backup, and IIRC was the impetus for
this patch.

Yes. (And aside from listing tables individually it'd be nice
to restore tables per schema.)

It's also a bit surprising that restoring table content
is so hard/unsupported, given a db of more than minimal
complexity.

Instead of adding a bunch of options to pg_restore,
perhaps a separate tool specific to this task would be the way to go.
It could handle the minutiae of truncating, dropping and recreating
constraints and indexes of the target tables, and dealing with FKs
sensibly, without worrying about conflicts with existing pg_restore
options and behavior.

Per above, the tool would then either require a db connection
or at least a dump which contains the system catalogs.

I'm afraid I don't have a clear picture of what such a tool
would look like, if it does not look a lot like pg_restore.
I would like to have such a tool. I'm not certain how
much I'd be able to contribute toward making one.

Meanwhile it sounds like the --truncate-tables patch
is looking less and less desirable. I'm ready for
rejection, but will soldier on in the interest of
not wasting other people work on this, if given
direction to move forward.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Karl O. Pinc
kop@meme.com
In reply to: Karl O. Pinc (#15)
Re: Suggestion for --truncate-tables to pg_restore

On 11/26/2012 09:30:48 PM, Karl O. Pinc wrote:

On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:

It is a common administrative task to selectively restore some
existing tables' contents from a backup, and IIRC was the impetus

for

this patch.

Yes. (And aside from listing tables individually it'd be nice
to restore tables per schema.)

As long as I'm daydreaming it'd be nice to be able to
restore a table, data and schema, and have available
the various combinations of: new table name, different
owner, different schema, different db. Without having
to edit a file by hand.

Of course I've not done the brain work involved in
figuring out just what this would mean in terms
of related objects like triggers, constraints,
indexes and so forth. But then who doesn't want
a pony? :-)

Regards,

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Josh Kupershmidt
schmiddy@gmail.com
In reply to: Karl O. Pinc (#15)
Re: Suggestion for --truncate-tables to pg_restore

Sorry for the delay in following up here.

On Mon, Nov 26, 2012 at 8:30 PM, Karl O. Pinc <kop@meme.com> wrote:

On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:

On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote:

P.S. An outstanding question regards --truncate-tables
is whether it should drop indexes before truncate
and re-create them after restore. Sounds like it should.

Well, that would improve performance, but it also makes the

behavior

of object significantly different from what one might expect from

the

name. One of the problems here is that there seem to be a number

of

slightly-different things that one might want to do, and it's not
exactly clear what all of them are, or whether a reasonable number

of

options can cater to all of them.

Another problem: attempting to drop a unique constraint or primary
key
(if we're counting these as indexes to be dropped and recreated,
which
they should be if the goal is reasonable restore performance) which
is
referenced by another table's foreign key will cause:
ERROR: cannot drop constraint xxx on table yyy
because other objects depend on it

and as discussed upthread, it would be impolite for pg_restore to
presume it should monkey with dropping+recreating other tables'
constraints to work around this problem, not to mention impossible
when pg_restore is not connected to the target database.

I'm thinking impossible because it's impossible to know
what the existing FKs are without a db connection. Impossible is
a problem. You may have another reason why it's impossible.

Yes, that's what I meant.

Meanwhile it sounds like the --truncate-tables patch
is looking less and less desirable. I'm ready for
rejection, but will soldier on in the interest of
not wasting other people work on this, if given
direction to move forward.

Well, as far as I was able to tell, the use-case where this patch
worked without trouble was limited to restoring a table, or schema
with table(s), that:
a.) has some view(s) dependent on it
b.) has no other tables with FK references to it, so that we don't run into:
ERROR: cannot truncate a table referenced in a foreign key constraint
c.) is not so large that it takes forever for data to be restored
with indexes and constraints left intact
d.) and whose admin does not want to use --clean plus a list-file
which limits pg_restore to the table and its views

I was initially hoping that the patch would be more useful for
restoring a table with FKs pointing to it, but it seems the only
reliable way to do this kind of selective restore with pg_restore is
with --clean and editing the list-file. Editing the list-file is
certainly tedious and prone to manual error, but I'm not sure this
particular patch has a wide enough use-case to alleviate that pain
significantly.

Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Karl O. Pinc
kop@meme.com
In reply to: Josh Kupershmidt (#17)
Re: Suggestion for --truncate-tables to pg_restore

On 12/04/2012 09:26:47 PM, Josh Kupershmidt wrote:

Sorry for the delay in following up here.

No problem at all.

Well, as far as I was able to tell, the use-case where this patch
worked without trouble was limited to restoring a table, or schema
with table(s), that:
a.) has some view(s) dependent on it
b.) has no other tables with FK references to it, so that we don't
run into:
ERROR: cannot truncate a table referenced in a foreign key
constraint
c.) is not so large that it takes forever for data to be restored
with indexes and constraints left intact
d.) and whose admin does not want to use --clean plus a list-file
which limits pg_restore to the table and its views

I was initially hoping that the patch would be more useful for
restoring a table with FKs pointing to it, but it seems the only
reliable way to do this kind of selective restore with pg_restore is
with --clean and editing the list-file. Editing the list-file is
certainly tedious and prone to manual error, but I'm not sure this
particular patch has a wide enough use-case to alleviate that pain
significantly.

I think there must be a reliable way to restore tables with FKs
pointing to them, but getting pg_restore to do it seems perilous; at
least given your expectations for the behavior of pg_restore in the
context of the existing option set.

As with you, I am not inclined to add another option to pg_restore
unless it's really useful. (Pg_restore already has gobs of options,
to the point where it's getting sticky.) I don't think this
patch meets the utility bar. It might be able to be re-worked into
something useful, or it might need to evolve into some sort of new
restore utility per your thoughts. For now better to reject it until
the right/comprehensive way to proceed is figured out.

Thanks for all your work. I hope that this has at least
moved the discussion forward and not been a waste of everybody's
time. I would like to see a "better" way of restoring
tables that are FK reference targets.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers