Table data exclusion patch for pg_dump

Started by Vadim Trochinskyover 16 years ago9 messages
1 attachment(s)

Hello!

This is a patch that allows choosing not to dump the data for the selected
tables.

The intended usage is to make backups smaller and faster, by allowing skipping
unneeded data, while still generating a backup that can be restored and obtain
a fully working application.

I use it to avoid dumping log data, and the content of tables that keep data
that has a short lifetime (which in the event of a restore would have expired
by then anyway)

The patch adds the -d and -D arguments:
-d --data=TABLE dump data for the named table(s)
-D --exclude-data=TABLE do NOT dump data for the named table(s)

I believe the patch to be complete, though I'm very new to the postgresql
codebase and might have missed something.

The patch applies to HEAD, compiles and worked properly in my tests.

Attachments:

exclude_table_data_v1.patchtext/x-patch; charset=UTF-8; name=exclude_table_data_v1.patchDownload
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 058c834..69d9efc 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -363,6 +363,7 @@ flagInhAttrs(TableInfo *tblinfo, int numTables)
 							attrDef->dobj.namespace = tbinfo->dobj.namespace;
 
 							attrDef->dobj.dump = tbinfo->dobj.dump;
+							attrDef->dobj.dumpData = tbinfo->dobj.dumpData;
 
 							attrDef->separate = false;
 							addObjectDependency(&tbinfo->dobj,
@@ -442,6 +443,7 @@ AssignDumpId(DumpableObject *dobj)
 	dobj->name = NULL;			/* must be set later */
 	dobj->namespace = NULL;		/* may be set later */
 	dobj->dump = true;			/* default assumption */
+	dobj->dumpData = true;			/* default assumption */
 	dobj->dependencies = NULL;
 	dobj->nDeps = 0;
 	dobj->allocDeps = 0;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 55d306c..a8fd3bb 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -90,6 +90,12 @@ static SimpleOidList table_include_oids = {NULL, NULL};
 static SimpleStringList table_exclude_patterns = {NULL, NULL};
 static SimpleOidList table_exclude_oids = {NULL, NULL};
 
+static SimpleStringList table_data_include_patterns = {NULL, NULL};
+static SimpleOidList table_data_include_oids = {NULL, NULL};
+static SimpleStringList table_data_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_data_exclude_oids = {NULL, NULL};
+
+
 /* default, if no "inclusion" switches appear, is to dump everything */
 static bool include_everything = true;
 
@@ -252,6 +258,8 @@ main(int argc, char **argv)
 		{"superuser", required_argument, NULL, 'S'},
 		{"table", required_argument, NULL, 't'},
 		{"exclude-table", required_argument, NULL, 'T'},
+		{"data", required_argument, NULL, 'd'},
+		{"exclude-data", required_argument, NULL, 'D'},
 		{"no-password", no_argument, NULL, 'w'},
 		{"password", no_argument, NULL, 'W'},
 		{"username", required_argument, NULL, 'U'},
@@ -311,7 +319,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "abcCE:f:F:h:in:N:oOp:RsS:t:T:U:vwWxX:Z:",
+	while ((c = getopt_long(argc, argv, "abcCE:f:F:h:in:N:oOp:RsS:t:T:d:D:U:vwWxX:Z:",
 							long_options, &optindex)) != -1)
 	{
 		switch (c)
@@ -332,6 +340,15 @@ main(int argc, char **argv)
 				outputCreate = 1;
 				break;
 
+			case 'd':			/* Include data for tables */
+				simple_string_list_append(&table_data_include_patterns, optarg);
+				include_everything = false;
+				break;
+
+			case 'D':			/* Exclude data for tables */
+				simple_string_list_append(&table_data_exclude_patterns, optarg);
+				break;
+
 			case 'E':			/* Dump encoding */
 				dumpencoding = optarg;
 				break;
@@ -665,6 +682,21 @@ main(int argc, char **argv)
 							   &table_exclude_oids);
 	/* non-matching exclusion patterns aren't an error */
 
+	/* Expand table selection patterns into OID lists */
+	if (table_data_include_patterns.head != NULL)
+	{
+		expand_table_name_patterns(&table_data_include_patterns,
+								   &table_data_include_oids);
+		if (table_data_include_oids.head == NULL)
+		{
+			write_msg(NULL, "No matching tables to include data for were found\n");
+			exit_nicely();
+		}
+	}
+	expand_table_name_patterns(&table_data_exclude_patterns,
+							   &table_data_exclude_oids);
+	/* non-matching exclusion patterns aren't an error */
+
 	/*
 	 * Dumping blobs is now default unless we saw an inclusion switch or -s
 	 * ... but even if we did see one of these, -b turns it back on.
@@ -814,6 +846,8 @@ help(const char *progname)
 	printf(_("  -S, --superuser=NAME        superuser user name to use in plain-text format\n"));
 	printf(_("  -t, --table=TABLE           dump the named table(s) only\n"));
 	printf(_("  -T, --exclude-table=TABLE   do NOT dump the named table(s)\n"));
+	printf(_("  -d  --data=TABLE            dump data for the named table(s)\n"));
+	printf(_("  -D  --exclude-data=TABLE    do NOT dump data for the named table(s)\n"));
 	printf(_("  -x, --no-privileges         do not dump privileges (grant/revoke)\n"));
 	printf(_("  --binary-upgrade            for use by upgrade utilities only\n"));
 	printf(_("  --inserts                   dump data as INSERT commands, rather than COPY\n"));
@@ -1003,6 +1037,24 @@ selectDumpableTable(TableInfo *tbinfo)
 		simple_oid_list_member(&table_exclude_oids,
 							   tbinfo->dobj.catId.oid))
 		tbinfo->dobj.dump = false;
+
+
+	/*
+	 * If data for specific tables is being dumped, dump it for just those tables; 
+	 * else, dump according to the parent namespace's dump flag.
+	 */
+	if (table_data_include_oids.head != NULL)
+		tbinfo->dobj.dumpData = simple_oid_list_member(&table_data_include_oids,
+												   tbinfo->dobj.catId.oid);
+
+	/*
+	 * In any case, a table's data can be excluded by an exclusion switch
+	 */
+	if (tbinfo->dobj.dumpData &&
+		simple_oid_list_member(&table_data_exclude_oids,
+							   tbinfo->dobj.catId.oid))
+		tbinfo->dobj.dumpData = false;
+
 }
 
 /*
@@ -1365,6 +1417,10 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
 	DataDumperPtr dumpFn;
 	char	   *copyStmt;
 
+	/* Skip if not to be dumped */
+	if (!tdinfo->dobj.dumpData)
+		return;
+
 	if (!dump_inserts)
 	{
 		/* Dump/restore using COPY */
@@ -1430,6 +1486,7 @@ getTableData(TableInfo *tblinfo, int numTables, bool oids)
 			AssignDumpId(&tdinfo->dobj);
 			tdinfo->dobj.name = tblinfo[i].dobj.name;
 			tdinfo->dobj.namespace = tblinfo[i].dobj.namespace;
+			tdinfo->dobj.dumpData = tblinfo[i].dobj.dumpData;
 			tdinfo->tdtable = &(tblinfo[i]);
 			tdinfo->oids = oids;
 			addObjectDependency(&tdinfo->dobj, tblinfo[i].dobj.dumpId);
@@ -4886,6 +4943,7 @@ getTableAttrs(TableInfo *tblinfo, int numTables)
 				attrdefs[j].dobj.namespace = tbinfo->dobj.namespace;
 
 				attrdefs[j].dobj.dump = tbinfo->dobj.dump;
+				attrdefs[j].dobj.dumpData = tbinfo->dobj.dumpData;
 
 				/*
 				 * Defaults on a VIEW must always be dumped as separate ALTER
@@ -5037,6 +5095,7 @@ getTableAttrs(TableInfo *tblinfo, int numTables)
 				constrs[j].separate = false;
 
 				constrs[j].dobj.dump = tbinfo->dobj.dump;
+				constrs[j].dobj.dumpData = tbinfo->dobj.dumpData;
 
 				/*
 				 * Mark the constraint as needing to appear before the table
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 067f005..5a39998 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -126,6 +126,7 @@ typedef struct _dumpableObject
 	char	   *name;			/* object name (should never be NULL) */
 	struct _namespaceInfo *namespace;	/* containing namespace, or NULL */
 	bool		dump;			/* true if we want to dump this object */
+	bool		dumpData;		/* true if we want to dump this object's data (tables only) */
 	DumpId	   *dependencies;	/* dumpIds of objects this one depends on */
 	int			nDeps;			/* number of valid dependencies */
 	int			allocDeps;		/* allocated size of dependencies[] */
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vadim Trochinsky (#1)
Re: Table data exclusion patch for pg_dump

Vadim Trochinsky <me@vadim.ws> writes:

This is a patch that allows choosing not to dump the data for the selected
tables.

Why wouldn't you just use -s ?

regards, tom lane

#3Asko Oja
ascoja@gmail.com
In reply to: Tom Lane (#2)
Re: Table data exclusion patch for pg_dump

How do you use -s to exclude data for some tables from otherwise full dump?
Dump schema and data separately?

On Fri, May 1, 2009 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Vadim Trochinsky <me@vadim.ws> writes:

This is a patch that allows choosing not to dump the data for the

selected

tables.

Why wouldn't you just use -s ?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: Table data exclusion patch for pg_dump

Tom Lane wrote:

Vadim Trochinsky <me@vadim.ws> writes:

This is a patch that allows choosing not to dump the data for the selected
tables.

Why wouldn't you just use -s ?

You might want the whole schema and data for most but not all of the
tables (e.g. you might leave out a large session table for a web app).

cheers

andrew

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#4)
Re: Table data exclusion patch for pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

Why wouldn't you just use -s ?

You might want the whole schema and data for most but not all of the
tables (e.g. you might leave out a large session table for a web app).

The use-case seems pretty thin to me, and the potential for shooting
oneself in the foot rather large. We routinely get complaints, for
example, from people who do partial dumps and then find out they don't
restore because of foreign key constraints. This looks like mostly
a larger-gauge version of that.

regards, tom lane

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#5)
Re: Table data exclusion patch for pg_dump

2009/5/1 Tom Lane <tgl@sss.pgh.pa.us>:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

Why wouldn't you just use -s ?

You might want the whole schema and data for most but not all of the
tables (e.g. you might leave out a large session table for a web app).

The use-case seems pretty thin to me, and the potential for shooting
oneself in the foot rather large.  We routinely get complaints, for
example, from people who do partial dumps and then find out they don't
restore because of foreign key constraints.  This looks like mostly
a larger-gauge version of that.

I am sorry, but this use-case is relative maybe often. When we
migrated from 8.1 to 8.3 we truncates all large audit and log tables.
Without this switch we had to use TRUNCATE or own substitution of
pg_dump. I thing so this switch should be implement like TRUNCATE
statement - you have to exclude all depended tables.

regards
Pavel Stehule

Show quoted text

                       regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: Table data exclusion patch for pg_dump

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

Why wouldn't you just use -s ?

You might want the whole schema and data for most but not all of the
tables (e.g. you might leave out a large session table for a web app).

The use-case seems pretty thin to me, and the potential for shooting
oneself in the foot rather large. We routinely get complaints, for
example, from people who do partial dumps and then find out they don't
restore because of foreign key constraints. This looks like mostly
a larger-gauge version of that.

Well, you can shoot yourself in the foot using pg_restore's --use-list
option too, but that doesn't mean it's not useful. And indeed it could
be used to achieve the OP's ends, except that he would have spent
useless time and space dumping the data for a table he doesn't want.

cheers

andrew

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: Table data exclusion patch for pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Well, you can shoot yourself in the foot using pg_restore's --use-list
option too, but that doesn't mean it's not useful. And indeed it could
be used to achieve the OP's ends, except that he would have spent
useless time and space dumping the data for a table he doesn't want.

The critical difference, of course, being that when he discovers that it
doesn't work, he *has got* the data he needs to have to perform the
restore. Excluding data at dump time is considerably more dangerous
because of the likelihood that you won't have any recourse when you
need it.

(In which connection, it is an astonishingly bad idea to repurpose -d
and -D for this behavior, so soon after we deprecated them.)

regards, tom lane

#9Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Andrew Dunstan (#4)
Re: Table data exclusion patch for pg_dump

On Fri, May 1, 2009 at 2:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

Tom Lane wrote:

Vadim Trochinsky <me@vadim.ws> writes:

This is a patch that allows choosing not to dump the data for the
selected tables.

Why wouldn't you just use -s ?

You might want the whole schema and data for most but not all of the tables
(e.g. you might leave out a large session table for a web app).

Actually you can use pg_dump -T to exclude some tables and then
pg_dump -s -t to dump the schema of those tables

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157