db partial dumping with pg_dump

Started by Oleg Bartunovover 23 years ago16 messages
#1Oleg Bartunov
oleg@sai.msu.su

We have a patch for pg_dump which adds a possibility to dump
a part of table, for example:

dump Top.Science.Astronomy heirarchy from dmoz catalog

pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz

We found it's very useful. We'd like to extend it to use also with COPY
but it has no support for select (of course it will works only for
special case when ALL columns retrieved).

The question is: Is't worth to submit patch for pg_dump and look into
copy code ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#2Rod Taylor
rbt@zort.ca
In reply to: Oleg Bartunov (#1)
Re: db partial dumping with pg_dump

On Tue, 2002-08-13 at 07:49, Oleg Bartunov wrote:

We have a patch for pg_dump which adds a possibility to dump
a part of table, for example:

dump Top.Science.Astronomy heirarchy from dmoz catalog

pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz

We found it's very useful. We'd like to extend it to use also with COPY
but it has no support for select (of course it will works only for
special case when ALL columns retrieved).

The question is: Is't worth to submit patch for pg_dump and look into
copy code ?

I've been asked by co-workers for information on how to do this type of
thing. They do partial table dumps to a development system for the
purpose of finding / eliminating bugs.

That said, it may be smart to make the 'select * from <table>' part
yourself, and let the user supply a where clause.

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Rod Taylor (#2)
1 attachment(s)
Re: db partial dumping with pg_dump

On 13 Aug 2002, Rod Taylor wrote:

On Tue, 2002-08-13 at 07:49, Oleg Bartunov wrote:

We have a patch for pg_dump which adds a possibility to dump
a part of table, for example:

dump Top.Science.Astronomy heirarchy from dmoz catalog

pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz

We found it's very useful. We'd like to extend it to use also with COPY
but it has no support for select (of course it will works only for
special case when ALL columns retrieved).

The question is: Is't worth to submit patch for pg_dump and look into
copy code ?

I've been asked by co-workers for information on how to do this type of
thing. They do partial table dumps to a development system for the
purpose of finding / eliminating bugs.

That said, it may be smart to make the 'select * from <table>' part
yourself, and let the user supply a where clause.

find patch (7.2) in attachement. Note, it works with -d (insert mode) option.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Attachments:

pg_dump.patchtext/plain; charset=US-ASCII; name=pg_dump.patchDownload
--- pg_dump.c.old	Fri Aug  9 19:41:08 2002
+++ pg_dump.c	Fri Aug  9 19:28:34 2002
@@ -128,6 +128,7 @@
 	TableInfo  *tblinfo;
 	int			tblidx;
 	bool		oids;
+	const char *select_command;
 } DumpContext;

 static void
@@ -390,6 +391,7 @@
 {
 	const DumpContext *dctx = (DumpContext *) dctxv;
 	const char *classname = dctx->tblinfo[dctx->tblidx].relname;
+	const char *select_command = dctx->select_command;

 	PGresult   *res;
 	PQExpBuffer q = createPQExpBuffer();
@@ -397,9 +399,23 @@
 	int			field;

 	if (fout->remoteVersion >= 70100)
-		appendPQExpBuffer(q, "DECLARE _pg_dump_cursor CURSOR FOR SELECT * FROM ONLY %s", fmtId(classname, force_quotes));
+		if (select_command)
+		{
+			appendPQExpBuffer(q, "DECLARE _pg_dump_cursor CURSOR FOR %s", select_command);
+		}
+		else
+		{
+			appendPQExpBuffer(q, "DECLARE _pg_dump_cursor CURSOR FOR SELECT * FROM ONLY %s", fmtId(classname, force_quotes));
+		}
 	else
-		appendPQExpBuffer(q, "DECLARE _pg_dump_cursor CURSOR FOR SELECT * FROM %s", fmtId(classname, force_quotes));
+		if (select_command)
+		{
+			appendPQExpBuffer(q, "DECLARE _pg_dump_cursor CURSOR FOR %s", select_command);
+		}
+		else
+		{
+			appendPQExpBuffer(q, "DECLARE _pg_dump_cursor CURSOR FOR SELECT * FROM %s", fmtId(classname, force_quotes));
+		}

 	res = PQexec(g_conn, q->data);
 	if (!res ||
@@ -547,7 +563,7 @@
  */
 static void
 dumpClasses(const TableInfo *tblinfo, const int numTables, Archive *fout,
-		 const char *onlytable, const bool oids, const bool force_quotes)
+		 const char *onlytable, const bool oids, const bool force_quotes, const char *select_command)
 {
 	int			i;
 	DataDumperPtr dumpFn;
@@ -587,6 +603,7 @@
 			dumpCtx->tblinfo = (TableInfo *) tblinfo;
 			dumpCtx->tblidx = i;
 			dumpCtx->oids = oids;
+			dumpCtx->select_command = NULL;

 			if (!dumpData)
 			{
@@ -602,6 +619,10 @@
 				/* Restore using INSERT */
 				dumpFn = dumpClasses_dumpData;
 				copyStmt = NULL;
+				if (onlytable && (strcmp(classname, onlytable) == 0))
+				{
+					dumpCtx->select_command = select_command;
+				}
 			}

 			ArchiveEntry(fout, tblinfo[i].oid, tblinfo[i].relname,
@@ -648,6 +669,7 @@
 	const char *pghost = NULL;
 	const char *pgport = NULL;
 	const char *username = NULL;
+	const char *select_command = NULL;
 	char	   *tablename = NULL;
 	bool		oids = false;
 	TableInfo  *tblinfo;
@@ -694,6 +716,7 @@
 		{"no-privileges", no_argument, NULL, 'x'},
 		{"no-acl", no_argument, NULL, 'x'},
 		{"compress", required_argument, NULL, 'Z'},
+		{"where-clause", required_argument, NULL, 'w'},
 		{"help", no_argument, NULL, '?'},
 		{"version", no_argument, NULL, 'V'},

@@ -748,9 +771,9 @@
 	}

 #ifdef HAVE_GETOPT_LONG
-	while ((c = getopt_long(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:w:V?", long_options, &optindex)) != -1)
 #else
-	while ((c = getopt(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:V?-")) != -1)
+	while ((c = getopt(argc, argv, "abcCdDf:F:h:inNoOp:RsS:t:uU:vWxX:zZ:w:V?-")) != -1)
 #endif

 	{
@@ -881,6 +904,9 @@
 			case 'v':			/* verbose */
 				g_verbose = true;
 				break;
+			case 'w':			/* select query */
+				select_command = optarg;
+				break;

 			case 'W':
 				force_password = true;
@@ -1063,7 +1089,7 @@
 	tblinfo = dumpSchema(g_fout, &numTables, tablename, aclsSkip, oids, schemaOnly, dataOnly);

 	if (!schemaOnly)
-		dumpClasses(tblinfo, numTables, g_fout, tablename, oids, force_quotes);
+		dumpClasses(tblinfo, numTables, g_fout, tablename, oids, force_quotes, select_command);

 	if (outputBlobs)
 		ArchiveEntry(g_fout, "0", "BLOBS", "BLOBS", NULL, "", "", "", "", dumpBlobs, 0);
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#3)
Re: db partial dumping with pg_dump

This patch seems extremely messy to me. Unless I'm missing something,
-w just plain fails except when you are dumping a specific table (ie,
-t must be given as well). And heaven help you if you specify a
different table in -t than the one -w is selecting from. This isn't
well thought out.

I'm not at all convinced that such a thing belongs in pg_dump anyway.
It'd be more useful as a manually-invokable feature, I think. You
can almost do this in psql with
select * from table where something
\g outfile
but I don't think you can get psql to emit the data in a form that can
be reloaded reliably (it won't quote data characters that look the same
as column delimiters, for instance).

What would seem to make sense is adding a WHERE-clause option to
COPY TO, and then you could go
COPY table TO 'myfile' WHERE ...
We already have column-list support in COPY, so we can already slice the
table vertically --- WHERE would let you slice it horizontally, which
seems a natural extension. (BTW, has anyone taught psql's \copy about
column lists? AFAIR the original patch was only against the backend.)

I'm finding it hard to visualize situations where I'd want the extra
baggage of pg_dump for something like this. If I want the schema at
all, I'll probably want it separate from the data so that I can hack
the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
and then do the selective copying separately.

regards, tom lane

#5Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#4)
Re: db partial dumping with pg_dump

On Tue, 2002-08-13 at 20:24, Tom Lane wrote:

What would seem to make sense is adding a WHERE-clause option to
COPY TO, and then you could go
COPY table TO 'myfile' WHERE ...

What about :

COPY table TO 'myfile' [ WHERE cond ] [ AS INSERT [ WITH COLUMN ] ];

to get the data as INSERT INTO statements (pg_dump -d), with optional
column list (pg_dump -D)

----------------
Hannu

#6Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#4)
Re: db partial dumping with pg_dump

On Tue, 13 Aug 2002, Tom Lane wrote:

This patch seems extremely messy to me. Unless I'm missing something,
-w just plain fails except when you are dumping a specific table (ie,
-t must be given as well). And heaven help you if you specify a
different table in -t than the one -w is selecting from. This isn't
well thought out.

You're right. The patch I've sent was no way for submitting to sources !
I just asked about the feature and attached to message for Rod Taylor.

I'm not at all convinced that such a thing belongs in pg_dump anyway.
It'd be more useful as a manually-invokable feature, I think. You
can almost do this in psql with
select * from table where something
\g outfile
but I don't think you can get psql to emit the data in a form that can
be reloaded reliably (it won't quote data characters that look the same
as column delimiters, for instance).

that was the reason we don't use psql for dumping

What would seem to make sense is adding a WHERE-clause option to
COPY TO, and then you could go
COPY table TO 'myfile' WHERE ...
We already have column-list support in COPY, so we can already slice the
table vertically --- WHERE would let you slice it horizontally, which
seems a natural extension. (BTW, has anyone taught psql's \copy about
column lists? AFAIR the original patch was only against the backend.)

I'm finding it hard to visualize situations where I'd want the extra
baggage of pg_dump for something like this. If I want the schema at
all, I'll probably want it separate from the data so that I can hack
the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
and then do the selective copying separately.

that'd be nice ! I often need such a feature because db at work are often
too large to play at home :-)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#7Hannu Krosing
hannu@tm.ee
In reply to: Oleg Bartunov (#6)
Re: db partial dumping with pg_dump

On Wed, 2002-08-14 at 00:03, Bruce Momjian wrote:

Actually, loading all this stuff into COPY is not the way to go, I
think.

Informix had:

UNLOAD TO 'filename'
SELECT ...

I have to admit, this is a superior way to do thing compared to what we
have. Is is possible for us to get:

COPY TO 'filename'
SELECT ...

It allows any arbitrary table, group by, even order by combination.

It would be more in line with the rest of the system to just allow
subselect as 'table'

COPY (select in,name from mystuff wher id > 10) over10stuff TO stdout;

-----------------
Hannu

#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hannu Krosing (#5)
Re: db partial dumping with pg_dump

Actually, loading all this stuff into COPY is not the way to go, I
think.

Informix had:

UNLOAD TO 'filename'
SELECT ...

I have to admit, this is a superior way to do thing compared to what we
have. Is is possible for us to get:

COPY TO 'filename'
SELECT ...

It allows any arbitrary table, group by, even order by combination.

---------------------------------------------------------------------------

Hannu Krosing wrote:

On Tue, 2002-08-13 at 20:24, Tom Lane wrote:

What would seem to make sense is adding a WHERE-clause option to
COPY TO, and then you could go
COPY table TO 'myfile' WHERE ...

What about :

COPY table TO 'myfile' [ WHERE cond ] [ AS INSERT [ WITH COLUMN ] ];

to get the data as INSERT INTO statements (pg_dump -d), with optional
column list (pg_dump -D)

----------------
Hannu

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hannu Krosing (#7)
Re: db partial dumping with pg_dump

Hannu Krosing wrote:

On Wed, 2002-08-14 at 00:03, Bruce Momjian wrote:

Actually, loading all this stuff into COPY is not the way to go, I
think.

Informix had:

UNLOAD TO 'filename'
SELECT ...

I have to admit, this is a superior way to do thing compared to what we
have. Is is possible for us to get:

COPY TO 'filename'
SELECT ...

It allows any arbitrary table, group by, even order by combination.

It would be more in line with the rest of the system to just allow
subselect as 'table'

COPY (select in,name from mystuff wher id > 10) over10stuff TO stdout;

Yep, that would work too. Clearly, we should shoot for something that
leverages the existing SELECT code rather than hang more clauses off of
COPY.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: db partial dumping with pg_dump

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Clearly, we should shoot for something that leverages the existing
SELECT code rather than hang more clauses off of COPY.

Yeah, that's a good point. COPY IN is still a special case, I think,
but seems like COPY OUT could be reimplemented as a special tuple
destination for the regular executor machinery.

Q: how much performance hit would we be taking? If this slows down
pg_dump a lot, the extra code is worth keeping.

regards, tom lane

#11Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#4)
Re: db partial dumping with pg_dump

I'm finding it hard to visualize situations where I'd want the extra
baggage of pg_dump for something like this. If I want the schema at
all, I'll probably want it separate from the data so that I can hack
the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
and then do the selective copying separately.

Tell you what I'd kill for:

pg_dump -t users_\* db > users.sql

Sort of thing.

Chris

#12Scott Shattuck
ss@technicalpursuit.com
In reply to: Christopher Kings-Lynne (#11)
Re: db partial dumping with pg_dump

On Tue, 2002-08-13 at 19:54, Christopher Kings-Lynne wrote:

I'm finding it hard to visualize situations where I'd want the extra
baggage of pg_dump for something like this. If I want the schema at
all, I'll probably want it separate from the data so that I can hack
the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
and then do the selective copying separately.

Tell you what I'd kill for:

pg_dump -t users_\* db > users.sql

Sort of thing.

Chris

I'd kill for pg_restore --ignore-missing-objects --clean .... so I could
run refreshes against databases that are changing schemas rather than
having to constantly edit the output of -l.

I'd also kill for pg_restore --ignore-existing-objects .... so I could
run the darn thing against a database that's already got pl/pgsql
installed in template1 and the dump file wants to install it again etc.

ss

Show quoted text

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#11)
Re: db partial dumping with pg_dump

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

Tell you what I'd kill for:
pg_dump -t users_\* db > users.sql

Yeah, I was just about to start working on pg_dump's -t switch.
It absolutely *must* be made schema-aware for 7.3 (-t schema.table
doesn't work today) and it seems like we might as well try to allow
wildcard support. Another thought I had was to accept multiple -t
switches, eg "pg_dump -t foo -t bar -t schema.baz*" to dump all tables
matching any of the patterns.

regards, tom lane

#14Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#13)
Re: db partial dumping with pg_dump

Yeah, I was just about to start working on pg_dump's -t switch.
It absolutely *must* be made schema-aware for 7.3 (-t schema.table
doesn't work today) and it seems like we might as well try to allow
wildcard support. Another thought I had was to accept multiple -t
switches, eg "pg_dump -t foo -t bar -t schema.baz*" to dump all tables
matching any of the patterns.

What about a switch to set schema search path as well? You could also just
have the tables comma separated in the -t ??

Chris

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Shattuck (#12)
Re: db partial dumping with pg_dump

Scott Shattuck <ss@technicalpursuit.com> writes:

I'd also kill for pg_restore --ignore-existing-objects .... so I could
run the darn thing against a database that's already got pl/pgsql
installed in template1 and the dump file wants to install it again etc.

In general, I think it's a mistake for pg_restore to bail out on errors.
The underlying pg_dump scripts have been built and optimized on the
assumption that psql would keep plugging after seeing an error. For
example, scripts containing "\connect - foo" still work if there's no
"foo" user ... but only because psql doesn't go belly-up. pg_restore
is way less forgiving.

I think the ideal behavior for pg_restore would be to abandon work on
the current dump item upon seeing a SQL error, but to pick up with the
next one. (Of course we could have an --anal-retentive switch to bail
on first error, but I doubt it'd be used much.) Errors associated with
ownership switches shouldn't cause failure in any case.

Anybody care to submit patches to make this happen?

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#14)
Re: db partial dumping with pg_dump

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

What about a switch to set schema search path as well?

I'm leaning to the thought that this is a bad idea, and that pg_dump
should act in a way that's search-path-independent. But I need to
think more.

You could also just have the tables comma separated in the -t ??

Don't like it --- that makes comma a special character in pg_dump
tablename patterns, which is an exception we don't need to make.
The multiple-dash-t approach needs no such assumption.

regards, tom lane