Adding WHERE clause to pg_dump

Started by Simon Riggsover 17 years ago19 messages

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

1 attachment(s)

Attached patch implements WHERE clauses for pg_dump.

This is useful for producing data samples of a database

e.g. pg_dump -w "ctid < '(1000,1)' or random() < 0.1"

and can also be used for taking incremental backups, if data columns
exist to make a partial dump sensible.

e.g. pg_dump -w "last_update_timestamp > ...."

Columns such as this are very common because of optimistic locking
techniques in many databases.

This is designed to be used in conjunction with the TOM utility, and the
forthcoming patch to implement stats hooks. Taken together these
features will allow the ability to take a cut-down database environment
for testing, yet with statistics matching the main production database.

It was easier to write it and then discuss, since I needed to check the
feasibility of the idea before presenting it.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

pg_dump_where.v1.patchtext/x-patch; charset=UTF-8; name=pg_dump_where.v1.patchDownload

Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.103
diff -c -r1.103 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml	20 Jul 2008 18:43:30 -0000	1.103
--- doc/src/sgml/ref/pg_dump.sgml	25 Jul 2008 08:29:25 -0000
***************
*** 674,679 ****
--- 674,696 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>-w <replaceable>SQL where clause</replaceable></option></term>
+       <term><option>--where=<replaceable class="parameter">SQL where clause</replaceable></option></term>
+       <listitem>
+        <para>
+         Dumps data only for those rows specified. When this parameter is not 
+         specified the default is all rows. The <option>where</> clause
+         is applied to all tables dumped, so any columns named must be present
+         on all tables being dumped or <application>pg_dump</application>
+         will return an error. The phrase <quote>where</quote> need not be used,
+         since this will be added automatically. This option is ignored if
+         no data is dumped. <option>-w</> cannot currently be used at the same
+         time as <option>-o/--oids</>.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-W</option></term>
        <term><option>--password</option></term>
        <listitem>
***************
*** 875,880 ****
--- 892,908 ----
    </para>
  
    <para>
+    You can specify a data sample using <option>-w</>. An example would be to
+    dump all tables less than 8MB in full, while only a random 10% of rows for
+    any table 8MB or larger. Note that this may not dump all foreign key data
+    correctly, so choose your extract carefully for your own database.
+ 
+ <screen>
+ <prompt>$</prompt> <userinput>pg_dump -w "ctid < '(1000,1)' or random() > 0.1" mydb &gt; db.sql</userinput>
+ </screen>
+   </para>
+ 
+   <para>
     To dump all database objects except for tables whose names begin with
     <literal>ts_</literal>:
  
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Jul 2008 18:43:30 -0000	1.497
--- src/bin/pg_dump/pg_dump.c	25 Jul 2008 08:34:05 -0000
***************
*** 95,100 ****
--- 95,102 ----
  static SimpleStringList table_exclude_patterns = {NULL, NULL};
  static SimpleOidList table_exclude_oids = {NULL, NULL};
  
+ static	const char *where_clause = NULL;
+ 
  /* default, if no "inclusion" switches appear, is to dump everything */
  static bool include_everything = true;
  
***************
*** 188,194 ****
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
  static const char *getAttrName(int attrnum, TableInfo *tblInfo);
! static const char *fmtCopyColumnList(const TableInfo *ti);
  static void do_sql_command(PGconn *conn, const char *query);
  static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
  				 ExecStatusType expected);
--- 190,196 ----
  static void dumpEncoding(Archive *AH);
  static void dumpStdStrings(Archive *AH);
  static const char *getAttrName(int attrnum, TableInfo *tblInfo);
! static const char *fmtCopyColumnList(const TableInfo *ti, bool with_brackets);
  static void do_sql_command(PGconn *conn, const char *query);
  static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
  				 ExecStatusType expected);
***************
*** 250,255 ****
--- 252,258 ----
  		{"superuser", required_argument, NULL, 'S'},
  		{"table", required_argument, NULL, 't'},
  		{"exclude-table", required_argument, NULL, 'T'},
+ 		{"where", required_argument, NULL, 'w'},
  		{"password", no_argument, NULL, 'W'},
  		{"username", required_argument, NULL, 'U'},
  		{"verbose", no_argument, NULL, 'v'},
***************
*** 303,309 ****
  		}
  	}
  
! 	while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:N:oOp:RsS:t:T:U:vWxX:Z:",
  							long_options, &optindex)) != -1)
  	{
  		switch (c)
--- 306,312 ----
  		}
  	}
  
! 	while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:N:oOp:RsS:t:T:U:vw:WxX:Z:",
  							long_options, &optindex)) != -1)
  	{
  		switch (c)
***************
*** 404,409 ****
--- 407,416 ----
  				g_verbose = true;
  				break;
  
+ 			case 'w':
+ 				where_clause = optarg;
+ 				break;
+ 
  			case 'W':
  				force_password = true;
  				break;
***************
*** 483,488 ****
--- 490,501 ----
  		exit(1);
  	}
  
+ 	if (where_clause && oids)
+ 	{
+ 		write_msg(NULL, "options -w/--where and -o/--oids cannot be used together\n");
+ 		exit(1);
+ 	}
+ 
  	/* open the output file */
  	if (pg_strcasecmp(format, "a") == 0 || pg_strcasecmp(format, "append") == 0)
  	{
***************
*** 788,793 ****
--- 801,807 ----
  			 "                              plain text format\n"));
  	printf(_("  -t, --table=TABLE           dump the named table(s) only\n"));
  	printf(_("  -T, --exclude-table=TABLE   do NOT dump the named table(s)\n"));
+ 	printf(_("  -w, --where=SQL             dump data only for rows matching where clause\n"));
  	printf(_("  -x, --no-privileges         do not dump privileges (grant/revoke)\n"));
  	printf(_("  --disable-dollar-quoting    disable dollar quoting, use SQL standard quoting\n"));
  	printf(_("  --disable-triggers          disable triggers during data-only restore\n"));
***************
*** 1060,1082 ****
  	 * cases involving ADD COLUMN and inheritance.)
  	 */
  	if (g_fout->remoteVersion >= 70300)
! 		column_list = fmtCopyColumnList(tbinfo);
  	else
  		column_list = "";		/* can't select columns in COPY */
  
! 	if (oids && hasoids)
  	{
! 		appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout;",
  						  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  										 classname),
! 						  column_list);
  	}
  	else
  	{
! 		appendPQExpBuffer(q, "COPY %s %s TO stdout;",
  						  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  										 classname),
! 						  column_list);
  	}
  	res = PQexec(g_conn, q->data);
  	check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
--- 1074,1099 ----
  	 * cases involving ADD COLUMN and inheritance.)
  	 */
  	if (g_fout->remoteVersion >= 70300)
! 		column_list = fmtCopyColumnList(tbinfo, (where_clause == NULL));
  	else
  		column_list = "";		/* can't select columns in COPY */
  
! 	if (where_clause)
  	{
! 		appendPQExpBuffer(q, "COPY (SELECT %s FROM %s WHERE %s) TO stdout %s;",
! 						  column_list,
  						  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  										 classname),
! 						  where_clause,
! 						  ((oids && hasoids) ? "WITH OIDS" : ""));
  	}
  	else
  	{
! 		appendPQExpBuffer(q, "COPY %s %s TO stdout %s;",
  						  fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
  										 classname),
! 						  column_list,
! 						  ((oids && hasoids) ? "WITH OIDS" : ""));
  	}
  	res = PQexec(g_conn, q->data);
  	check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
***************
*** 1331,1337 ****
  		appendPQExpBuffer(copyBuf, "COPY %s ",
  						  fmtId(tbinfo->dobj.name));
  		appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
! 						  fmtCopyColumnList(tbinfo),
  					  (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "");
  		copyStmt = copyBuf->data;
  	}
--- 1348,1354 ----
  		appendPQExpBuffer(copyBuf, "COPY %s ",
  						  fmtId(tbinfo->dobj.name));
  		appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
! 						  fmtCopyColumnList(tbinfo, true),
  					  (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "");
  		copyStmt = copyBuf->data;
  	}
***************
*** 10492,10498 ****
   * "", not an invalid "()" column list.
   */
  static const char *
! fmtCopyColumnList(const TableInfo *ti)
  {
  	static PQExpBuffer q = NULL;
  	int			numatts = ti->numatts;
--- 10509,10515 ----
   * "", not an invalid "()" column list.
   */
  static const char *
! fmtCopyColumnList(const TableInfo *ti, bool with_brackets)
  {
  	static PQExpBuffer q = NULL;
  	int			numatts = ti->numatts;
***************
*** 10506,10512 ****
  	else
  		q = createPQExpBuffer();
  
! 	appendPQExpBuffer(q, "(");
  	needComma = false;
  	for (i = 0; i < numatts; i++)
  	{
--- 10523,10530 ----
  	else
  		q = createPQExpBuffer();
  
! 	if (with_brackets)
! 		appendPQExpBuffer(q, "(");
  	needComma = false;
  	for (i = 0; i < numatts; i++)
  	{
***************
*** 10521,10527 ****
  	if (!needComma)
  		return "";				/* no undropped columns */
  
! 	appendPQExpBuffer(q, ")");
  	return q->data;
  }
  
--- 10539,10546 ----
  	if (!needComma)
  		return "";				/* no undropped columns */
  
! 	if (with_brackets)
! 		appendPQExpBuffer(q, ")");
  	return q->data;
  }

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#1)

Re: Adding WHERE clause to pg_dump

Simon Riggs <simon@2ndquadrant.com> writes:

Attached patch implements WHERE clauses for pg_dump.

I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?

regards, tom lane

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#2)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Attached patch implements WHERE clauses for pg_dump.

I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?

So you can dump a coherent sample database in one command, not 207.

Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.

Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Joshua D. Drake

jd@commandprompt.com

over 17 years ago

In reply to: Simon Riggs (#3)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:

On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Attached patch implements WHERE clauses for pg_dump.

I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?

So you can dump a coherent sample database in one command, not 207.

Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.

Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.

Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Joshua D. Drake (#4)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:

On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:

On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Attached patch implements WHERE clauses for pg_dump.

I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?

So you can dump a coherent sample database in one command, not 207.

Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.

Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.

Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.

You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".

pg_dump is not "harassed" by this. What is lost by adding this feature?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Joshua D. Drake

jd@commandprompt.com

over 17 years ago

In reply to: Simon Riggs (#5)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 20:26 +0100, Simon Riggs wrote:

On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:

On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:

Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.

Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.

You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".

pg_dump is not "harassed" by this. What is lost by adding this feature?

Gained. Code complexity. Right now pg_dump does, copy. You are
introducing a whole other level of complexity by adding WHERE clause
capability. Secondly I don't think it would actually add anything but
complexity to the user.

How do we deal with this?

pg_dump -w "last_update_timestamp < ..." -t 'table*'

What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done. The only way to deal with the above
is:

1. Wildcards aren't allowed if you have -w
2. You dump everything, if the WHERE clause isn't relevant you just dump
the whole table

I don't like either.

I do see utility if you know what you are doing but I think it makes
more sense to have it outside of pg_dump.

Sincerely,

Joshua D. Drake

daveg

daveg@sonic.net

over 17 years ago

In reply to: Joshua D. Drake (#6)

Re: Adding WHERE clause to pg_dump

On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote:

On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:

On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:

On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Attached patch implements WHERE clauses for pg_dump.

I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?

So you can dump a coherent sample database in one command, not 207.

Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.

Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.

Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.

You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".

pg_dump is not "harassed" by this. What is lost by adding this feature?

This was discussed at the beginning of June on patches, Dave Durham submitted
a patch to add where clauses via a -w option and then in response to feedback
to add it to each each table of -t. See discussion here:

http://archives.postgresql.org/pgsql-patches/2008-06/msg00001.php

and final patch here:

http://archives.postgresql.org/pgsql-patches/2008-06/msg00026.php.

We now have two patches on this topic from different submitters with
different use cases supplied as justification. I have yet another use case
not mentioned by either of the submitters and will probably hand patch
pg_dump locally to do so.

I don't think at this point we should wave this off under the impression
that no one really wants or needs it as obviously some people want it enough
to code it. The other objections seem to be based on the themes:

- code complexity.

Davy's patch is quite simple. I have looked at Simon's yet.

- we need an ETL tool so this should be preempted by that.
- pg_dump should be made into a library so this can be done separately.

We don't generally allow imaginary futures to prevent us from adding
useful functionality on other topics.

- This can be done with a script.

Not really. The script would pretty much have to contain most of
pg_dump. That's more than a script.

- users could make partial dumps and be confused and lose data.

Yes, but they can already do that with -n, -t, and the new pre-data
and post-data switches. This is one more case where the default is
a full dump but you one can specificly request less.

I think that once COPY sprouted a WHERE clause it becomes almost inevitable
that pg_dump will take advantage of them. How many patches on this topic do
we want to ignore?

As you may have guessed by this point:
+1

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

Joshua D. Drake

jd@commandprompt.com

over 17 years ago

In reply to: daveg (#7)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 14:11 -0700, daveg wrote:

On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote:

- This can be done with a script.

Not really. The script would pretty much have to contain most of
pg_dump. That's more than a script.

Yes really. :) The only thing pg_dump is buying you here is easy of
schema pull. In a situation like this you would pull a pg_dump -s then
only restore data that you want based on a single transaction snapshot
of the objects you are going to query.

- users could make partial dumps and be confused and lose data.

Yes, but they can already do that with -n, -t, and the new pre-data
and post-data switches. This is one more case where the default is
a full dump but you one can specificly request less.

No they actually can't. You are guaranteed that regardless of a -n or -t
flag that the data you receive is consistent. You can't guarantee that
with -w because you could pull different data based on an arbitrary
conditional that can not apply to all objects.

Joshua D. Drake

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: daveg (#7)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 14:11 -0700, daveg wrote:

On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote:

On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:

On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:

On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Attached patch implements WHERE clauses for pg_dump.

I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?

So you can dump a coherent sample database in one command, not 207.

Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.

Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.

Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.

You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".

pg_dump is not "harassed" by this. What is lost by adding this feature?

This was discussed at the beginning of June on patches, Dave Durham submitted
a patch to add where clauses via a -w option and then in response to feedback
to add it to each each table of -t. See discussion here:

http://archives.postgresql.org/pgsql-patches/2008-06/msg00001.php

and final patch here:

http://archives.postgresql.org/pgsql-patches/2008-06/msg00026.php.

We now have two patches on this topic from different submitters with
different use cases supplied as justification.

Well, that is truly bizarre.

I had no idea about the existence of the other patch. I guess I must
have been busy that week.

This was designed a while back in conjunction with other related
thoughts. I still want an easy way to create a data sample for creating
dev databases from large production systems.

I defer and apologise to the previous submitter, since he got there
first, and apologise again for the noise.

(Cheeky code review: Davy's patch fails if used with -o option, plus I
think it outputs the wrong text into the dump file, AFAICS).

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#10

Davy Durham

ddurham@davyandbeth.com

over 17 years ago

In reply to: Simon Riggs (#9)

Re: Adding WHERE clause to pg_dump

Simon Riggs wrote:

Well, that is truly bizarre.

I had no idea about the existence of the other patch. I guess I must
have been busy that week.

This was designed a while back in conjunction with other related
thoughts. I still want an easy way to create a data sample for creating
dev databases from large production systems.

I defer and apologise to the previous submitter, since he got there
first, and apologise again for the noise.

(Cheeky code review: Davy's patch fails if used with -o option, plus I
think it outputs the wrong text into the dump file, AFAICS).

Are you using my patch at
http://archives.postgresql.org/pgsql-patches/2008-06/msg00026.php ?
I'll be glad to fix it.

#11

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Joshua D. Drake (#8)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 14:29 -0700, Joshua D. Drake wrote:

- users could make partial dumps and be confused and lose data.

Yes, but they can already do that with -n, -t, and the new

pre-data

and post-data switches. This is one more case where the

default is

a full dump but you one can specificly request less.

No they actually can't. You are guaranteed that regardless of a -n or
-t
flag that the data you receive is consistent. You can't guarantee that
with -w because you could pull different data based on an arbitrary
conditional that can not apply to all objects.

But are you guaranteed that you have all tables in FK relationships? No.
(But I like that capability also - its useful).

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#12

Gregory Stark

stark@enterprisedb.com

over 17 years ago

In reply to: Joshua D. Drake (#6)

Re: Adding WHERE clause to pg_dump

"Joshua D. Drake" <jd@commandprompt.com> writes:

How do we deal with this?

pg_dump -w "last_update_timestamp < ..." -t 'table*'

What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done. The only way to deal with the above
is:

1. Wildcards aren't allowed if you have -w
2. You dump everything, if the WHERE clause isn't relevant you just dump
the whole table

There's always

3. Apply the WHERE clause to all tables and if there's a table missing
columns referenced in the where clause then fail with the appropriate
error.

Which seems like the right option to me. The tricky bit would be how to deal
with cases where you want a different where clause for different tables. But
even if it doesn't handle all cases that doesn't mean a partial solution is
unreasonable.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

#13

daveg

daveg@sonic.net

over 17 years ago

In reply to: Gregory Stark (#12)

Re: Adding WHERE clause to pg_dump

On Fri, Jul 25, 2008 at 11:17:20PM +0100, Gregory Stark wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

How do we deal with this?

pg_dump -w "last_update_timestamp < ..." -t 'table*'

What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done. The only way to deal with the above
is:

1. Wildcards aren't allowed if you have -w
2. You dump everything, if the WHERE clause isn't relevant you just dump
the whole table

There's always

3. Apply the WHERE clause to all tables and if there's a table missing
columns referenced in the where clause then fail with the appropriate
error.

Which seems like the right option to me. The tricky bit would be how to deal
with cases where you want a different where clause for different tables. But
even if it doesn't handle all cases that doesn't mean a partial solution is
unreasonable.

Actually, Davy's patch does deal with the case "where you want a different
where clause for different tables".

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#14

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Joshua D. Drake (#6)

Re: Adding WHERE clause to pg_dump

On Fri, 2008-07-25 at 12:38 -0700, Joshua D. Drake wrote:

Gained. Code complexity.

Hardly, patch is very small. I would recognise that as a factor
otherwise.

What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done.

I agree on the backup side, but then who would extract just a portion of
their data for backup? It would be no backup at all.

If you did use this as part of an incremental backup scheme, then they
would have to test it (just like any backup method). Incremental backups
rarely have self-consistency except as part of a greater whole.

As a dev tool it makes sense.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#15

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Simon Riggs (#14)

Re: Adding WHERE clause to pg_dump

Simon Riggs wrote:

On Fri, 2008-07-25 at 12:38 -0700, Joshua D. Drake wrote:

Gained. Code complexity.

Hardly, patch is very small. I would recognise that as a factor
otherwise.

What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done.

I agree on the backup side, but then who would extract just a portion of
their data for backup? It would be no backup at all.

If you did use this as part of an incremental backup scheme, then they
would have to test it (just like any backup method). Incremental backups
rarely have self-consistency except as part of a greater whole.

As a dev tool it makes sense.

I think we have yet another case for moving the core bits of pg_dump
into a library that can then be used by lots of clients. Until we do
that we're going to get continual pressure to add extra cases to pg_dump
unrelated to its principal functionality.

cheers

andrew

#16

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Andrew Dunstan (#15)

Re: Adding WHERE clause to pg_dump

On Sat, 2008-07-26 at 07:47 -0400, Andrew Dunstan wrote:

Simon Riggs wrote:

As a dev tool it makes sense.

I think we have yet another case for moving the core bits of pg_dump
into a library that can then be used by lots of clients. Until we do
that we're going to get continual pressure to add extra cases to pg_dump
unrelated to its principal functionality.

That's a good idea and I support that.

I'm slightly suprised at the "principal functionality" bit. In a world
where PITR exists the role and importance of pg_dump has waned
considerably. What *is* its principal function? Does it have just one?

One man's dev system is another man's data warehouse, or another man's
backup. The meaning of a dump is defined by the user making the data
dump, not the tool used.

Is this one option sufficient to make us invent pg_make_dev_database?
(With all pg_dump options, plus -w). If that's what we need, fine by me.
I'm always interested in the capability not the structure/naming.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#17

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Simon Riggs (#16)

Re: Adding WHERE clause to pg_dump

Simon Riggs wrote:

In a world
where PITR exists the role and importance of pg_dump has waned
considerably. What *is* its principal function? Does it have just one?

I think that's probably a rather narrow perspective.

PITR doesn't work across versions or architectures or OSes. And if
you're using it for failover, then using it for standalone backups as
well means you will need a custom archive_command which can be a bit
tricky to get right. And a custom dump is almost always far smaller than
a PITR dump, even when it's compressed.

I suspect that the vast majority of our users are still using pg_dump to
make normal backups, and that it works quite happily for them. It's
really only when databases get pretty large that this becomes
unmanageable. I think using pg_dump for backups and PITR for failover is
a good combination for a great many users.

So, IMNSHO, making a full database backup is still pg_dump's principal
function.

cheers

andrew

#18

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Andrew Dunstan (#17)

Re: Adding WHERE clause to pg_dump

On Sat, 2008-07-26 at 09:08 -0400, Andrew Dunstan wrote:

So, IMNSHO, making a full database backup is still pg_dump's principal
function.

Making copies for development databases is also a common use case, and
if not more common than backups, at least not far behind. This was my
stated use case.

From my perspective, this should be fairly simple

* do we agree the use case is a problem we care about?
* do we agree the proposal would help that use case?
* whats the best way to package that feature?

If we wish to protect pg_dump's role, then lets have another utility or
some packaging that can be used for its other hidden roles. That sounds
like we might all agree on that. pg_dev_dump? How should it look?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#19

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Simon Riggs (#18)

Re: Adding WHERE clause to pg_dump

Simon Riggs wrote:

If we wish to protect pg_dump's role, then lets have another utility or
some packaging that can be used for its other hidden roles. That sounds
like we might all agree on that. pg_dev_dump? How should it look?

Actually, if we libraryise pg_dump and add some corresponding \ commands
to psql, then this would possibly be unnecessary .

cheers

andrew