Adding WHERE clause to pg_dump
Attached patch implements WHERE clauses for pg_dump.
This is useful for producing data samples of a database
e.g. pg_dump -w "ctid < '(1000,1)' or random() < 0.1"
and can also be used for taking incremental backups, if data columns
exist to make a partial dump sensible.
e.g. pg_dump -w "last_update_timestamp > ...."
Columns such as this are very common because of optimistic locking
techniques in many databases.
This is designed to be used in conjunction with the TOM utility, and the
forthcoming patch to implement stats hooks. Taken together these
features will allow the ability to take a cut-down database environment
for testing, yet with statistics matching the main production database.
It was easier to write it and then discuss, since I needed to check the
feasibility of the idea before presenting it.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
pg_dump_where.v1.patchtext/x-patch; charset=UTF-8; name=pg_dump_where.v1.patchDownload
Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.103
diff -c -r1.103 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml 20 Jul 2008 18:43:30 -0000 1.103
--- doc/src/sgml/ref/pg_dump.sgml 25 Jul 2008 08:29:25 -0000
***************
*** 674,679 ****
--- 674,696 ----
</varlistentry>
<varlistentry>
+ <term><option>-w <replaceable>SQL where clause</replaceable></option></term>
+ <term><option>--where=<replaceable class="parameter">SQL where clause</replaceable></option></term>
+ <listitem>
+ <para>
+ Dumps data only for those rows specified. When this parameter is not
+ specified the default is all rows. The <option>where</> clause
+ is applied to all tables dumped, so any columns named must be present
+ on all tables being dumped or <application>pg_dump</application>
+ will return an error. The phrase <quote>where</quote> need not be used,
+ since this will be added automatically. This option is ignored if
+ no data is dumped. <option>-w</> cannot currently be used at the same
+ time as <option>-o/--oids</>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-W</option></term>
<term><option>--password</option></term>
<listitem>
***************
*** 875,880 ****
--- 892,908 ----
</para>
<para>
+ You can specify a data sample using <option>-w</>. An example would be to
+ dump all tables less than 8MB in full, while only a random 10% of rows for
+ any table 8MB or larger. Note that this may not dump all foreign key data
+ correctly, so choose your extract carefully for your own database.
+
+ <screen>
+ <prompt>$</prompt> <userinput>pg_dump -w "ctid < '(1000,1)' or random() > 0.1" mydb > db.sql</userinput>
+ </screen>
+ </para>
+
+ <para>
To dump all database objects except for tables whose names begin with
<literal>ts_</literal>:
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c 20 Jul 2008 18:43:30 -0000 1.497
--- src/bin/pg_dump/pg_dump.c 25 Jul 2008 08:34:05 -0000
***************
*** 95,100 ****
--- 95,102 ----
static SimpleStringList table_exclude_patterns = {NULL, NULL};
static SimpleOidList table_exclude_oids = {NULL, NULL};
+ static const char *where_clause = NULL;
+
/* default, if no "inclusion" switches appear, is to dump everything */
static bool include_everything = true;
***************
*** 188,194 ****
static void dumpEncoding(Archive *AH);
static void dumpStdStrings(Archive *AH);
static const char *getAttrName(int attrnum, TableInfo *tblInfo);
! static const char *fmtCopyColumnList(const TableInfo *ti);
static void do_sql_command(PGconn *conn, const char *query);
static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
ExecStatusType expected);
--- 190,196 ----
static void dumpEncoding(Archive *AH);
static void dumpStdStrings(Archive *AH);
static const char *getAttrName(int attrnum, TableInfo *tblInfo);
! static const char *fmtCopyColumnList(const TableInfo *ti, bool with_brackets);
static void do_sql_command(PGconn *conn, const char *query);
static void check_sql_result(PGresult *res, PGconn *conn, const char *query,
ExecStatusType expected);
***************
*** 250,255 ****
--- 252,258 ----
{"superuser", required_argument, NULL, 'S'},
{"table", required_argument, NULL, 't'},
{"exclude-table", required_argument, NULL, 'T'},
+ {"where", required_argument, NULL, 'w'},
{"password", no_argument, NULL, 'W'},
{"username", required_argument, NULL, 'U'},
{"verbose", no_argument, NULL, 'v'},
***************
*** 303,309 ****
}
}
! while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:N:oOp:RsS:t:T:U:vWxX:Z:",
long_options, &optindex)) != -1)
{
switch (c)
--- 306,312 ----
}
}
! while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:N:oOp:RsS:t:T:U:vw:WxX:Z:",
long_options, &optindex)) != -1)
{
switch (c)
***************
*** 404,409 ****
--- 407,416 ----
g_verbose = true;
break;
+ case 'w':
+ where_clause = optarg;
+ break;
+
case 'W':
force_password = true;
break;
***************
*** 483,488 ****
--- 490,501 ----
exit(1);
}
+ if (where_clause && oids)
+ {
+ write_msg(NULL, "options -w/--where and -o/--oids cannot be used together\n");
+ exit(1);
+ }
+
/* open the output file */
if (pg_strcasecmp(format, "a") == 0 || pg_strcasecmp(format, "append") == 0)
{
***************
*** 788,793 ****
--- 801,807 ----
" plain text format\n"));
printf(_(" -t, --table=TABLE dump the named table(s) only\n"));
printf(_(" -T, --exclude-table=TABLE do NOT dump the named table(s)\n"));
+ printf(_(" -w, --where=SQL dump data only for rows matching where clause\n"));
printf(_(" -x, --no-privileges do not dump privileges (grant/revoke)\n"));
printf(_(" --disable-dollar-quoting disable dollar quoting, use SQL standard quoting\n"));
printf(_(" --disable-triggers disable triggers during data-only restore\n"));
***************
*** 1060,1082 ****
* cases involving ADD COLUMN and inheritance.)
*/
if (g_fout->remoteVersion >= 70300)
! column_list = fmtCopyColumnList(tbinfo);
else
column_list = ""; /* can't select columns in COPY */
! if (oids && hasoids)
{
! appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout;",
fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
classname),
! column_list);
}
else
{
! appendPQExpBuffer(q, "COPY %s %s TO stdout;",
fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
classname),
! column_list);
}
res = PQexec(g_conn, q->data);
check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
--- 1074,1099 ----
* cases involving ADD COLUMN and inheritance.)
*/
if (g_fout->remoteVersion >= 70300)
! column_list = fmtCopyColumnList(tbinfo, (where_clause == NULL));
else
column_list = ""; /* can't select columns in COPY */
! if (where_clause)
{
! appendPQExpBuffer(q, "COPY (SELECT %s FROM %s WHERE %s) TO stdout %s;",
! column_list,
fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
classname),
! where_clause,
! ((oids && hasoids) ? "WITH OIDS" : ""));
}
else
{
! appendPQExpBuffer(q, "COPY %s %s TO stdout %s;",
fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
classname),
! column_list,
! ((oids && hasoids) ? "WITH OIDS" : ""));
}
res = PQexec(g_conn, q->data);
check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
***************
*** 1331,1337 ****
appendPQExpBuffer(copyBuf, "COPY %s ",
fmtId(tbinfo->dobj.name));
appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
! fmtCopyColumnList(tbinfo),
(tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "");
copyStmt = copyBuf->data;
}
--- 1348,1354 ----
appendPQExpBuffer(copyBuf, "COPY %s ",
fmtId(tbinfo->dobj.name));
appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
! fmtCopyColumnList(tbinfo, true),
(tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "");
copyStmt = copyBuf->data;
}
***************
*** 10492,10498 ****
* "", not an invalid "()" column list.
*/
static const char *
! fmtCopyColumnList(const TableInfo *ti)
{
static PQExpBuffer q = NULL;
int numatts = ti->numatts;
--- 10509,10515 ----
* "", not an invalid "()" column list.
*/
static const char *
! fmtCopyColumnList(const TableInfo *ti, bool with_brackets)
{
static PQExpBuffer q = NULL;
int numatts = ti->numatts;
***************
*** 10506,10512 ****
else
q = createPQExpBuffer();
! appendPQExpBuffer(q, "(");
needComma = false;
for (i = 0; i < numatts; i++)
{
--- 10523,10530 ----
else
q = createPQExpBuffer();
! if (with_brackets)
! appendPQExpBuffer(q, "(");
needComma = false;
for (i = 0; i < numatts; i++)
{
***************
*** 10521,10527 ****
if (!needComma)
return ""; /* no undropped columns */
! appendPQExpBuffer(q, ")");
return q->data;
}
--- 10539,10546 ----
if (!needComma)
return ""; /* no undropped columns */
! if (with_brackets)
! appendPQExpBuffer(q, ")");
return q->data;
}
Simon Riggs <simon@2ndquadrant.com> writes:
Attached patch implements WHERE clauses for pg_dump.
I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?
regards, tom lane
On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Attached patch implements WHERE clauses for pg_dump.
I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?
So you can dump a coherent sample database in one command, not 207.
Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.
Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:
On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Attached patch implements WHERE clauses for pg_dump.
I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?So you can dump a coherent sample database in one command, not 207.
Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.
Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.
Sincerely,
Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:
On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:
On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Attached patch implements WHERE clauses for pg_dump.
I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?So you can dump a coherent sample database in one command, not 207.
Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.
You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".
pg_dump is not "harassed" by this. What is lost by adding this feature?
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Fri, 2008-07-25 at 20:26 +0100, Simon Riggs wrote:
On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:
On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:
Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".pg_dump is not "harassed" by this. What is lost by adding this feature?
Gained. Code complexity. Right now pg_dump does, copy. You are
introducing a whole other level of complexity by adding WHERE clause
capability. Secondly I don't think it would actually add anything but
complexity to the user.
How do we deal with this?
pg_dump -w "last_update_timestamp < ..." -t 'table*'
What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done. The only way to deal with the above
is:
1. Wildcards aren't allowed if you have -w
2. You dump everything, if the WHERE clause isn't relevant you just dump
the whole table
I don't like either.
I do see utility if you know what you are doing but I think it makes
more sense to have it outside of pg_dump.
Sincerely,
Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote:
On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:
On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:
On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Attached patch implements WHERE clauses for pg_dump.
I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?So you can dump a coherent sample database in one command, not 207.
Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".pg_dump is not "harassed" by this. What is lost by adding this feature?
This was discussed at the beginning of June on patches, Dave Durham submitted
a patch to add where clauses via a -w option and then in response to feedback
to add it to each each table of -t. See discussion here:
http://archives.postgresql.org/pgsql-patches/2008-06/msg00001.php
and final patch here:
http://archives.postgresql.org/pgsql-patches/2008-06/msg00026.php.
We now have two patches on this topic from different submitters with
different use cases supplied as justification. I have yet another use case
not mentioned by either of the submitters and will probably hand patch
pg_dump locally to do so.
I don't think at this point we should wave this off under the impression
that no one really wants or needs it as obviously some people want it enough
to code it. The other objections seem to be based on the themes:
- code complexity.
Davy's patch is quite simple. I have looked at Simon's yet.
- we need an ETL tool so this should be preempted by that.
- pg_dump should be made into a library so this can be done separately.
We don't generally allow imaginary futures to prevent us from adding
useful functionality on other topics.
- This can be done with a script.
Not really. The script would pretty much have to contain most of
pg_dump. That's more than a script.
- users could make partial dumps and be confused and lose data.
Yes, but they can already do that with -n, -t, and the new pre-data
and post-data switches. This is one more case where the default is
a full dump but you one can specificly request less.
I think that once COPY sprouted a WHERE clause it becomes almost inevitable
that pg_dump will take advantage of them. How many patches on this topic do
we want to ignore?
As you may have guessed by this point:
+1
-dg
--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.
On Fri, 2008-07-25 at 14:11 -0700, daveg wrote:
On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote:
- This can be done with a script.
Not really. The script would pretty much have to contain most of
pg_dump. That's more than a script.
Yes really. :) The only thing pg_dump is buying you here is easy of
schema pull. In a situation like this you would pull a pg_dump -s then
only restore data that you want based on a single transaction snapshot
of the objects you are going to query.
- users could make partial dumps and be confused and lose data.
Yes, but they can already do that with -n, -t, and the new pre-data
and post-data switches. This is one more case where the default is
a full dump but you one can specificly request less.
No they actually can't. You are guaranteed that regardless of a -n or -t
flag that the data you receive is consistent. You can't guarantee that
with -w because you could pull different data based on an arbitrary
conditional that can not apply to all objects.
Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Fri, 2008-07-25 at 14:11 -0700, daveg wrote:
On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote:
On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote:
On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote:
On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Attached patch implements WHERE clauses for pg_dump.
I still have serious reservations about adding such an ugly,
non-orthogonal wart to pg_dump. Why is it not appropriate to just
do a COPY (SELECT ...) TO STDOUT when you need this?So you can dump a coherent sample database in one command, not 207.
Every user of PostgreSQL wants a dev/test database. If the database is
large it isn't practical to take a complete copy. Nor is it practical to
hand-write a data sampling extraction program and if you do, its usually
imperfect in many ways.Adding this feature gives a very fast capability to create sample
databases, or incremental backups for many cases.Not sure I buy this argument. I am all for usability and I would be the
first to shout about the general ridiculousness of pg_dump/all/restore
but in this case I think Tom is right. This feature could easily be done
in a script without harassing pg_dump.You can do it, yes. But it takes a lot longer. If the time to implement
was similar, then I would immediately agree "feature available already".pg_dump is not "harassed" by this. What is lost by adding this feature?
This was discussed at the beginning of June on patches, Dave Durham submitted
a patch to add where clauses via a -w option and then in response to feedback
to add it to each each table of -t. See discussion here:http://archives.postgresql.org/pgsql-patches/2008-06/msg00001.php
and final patch here:
http://archives.postgresql.org/pgsql-patches/2008-06/msg00026.php.
We now have two patches on this topic from different submitters with
different use cases supplied as justification.
Well, that is truly bizarre.
I had no idea about the existence of the other patch. I guess I must
have been busy that week.
This was designed a while back in conjunction with other related
thoughts. I still want an easy way to create a data sample for creating
dev databases from large production systems.
I defer and apologise to the previous submitter, since he got there
first, and apologise again for the noise.
(Cheeky code review: Davy's patch fails if used with -o option, plus I
think it outputs the wrong text into the dump file, AFAICS).
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
Well, that is truly bizarre.
I had no idea about the existence of the other patch. I guess I must
have been busy that week.This was designed a while back in conjunction with other related
thoughts. I still want an easy way to create a data sample for creating
dev databases from large production systems.I defer and apologise to the previous submitter, since he got there
first, and apologise again for the noise.(Cheeky code review: Davy's patch fails if used with -o option, plus I
think it outputs the wrong text into the dump file, AFAICS).
Are you using my patch at
http://archives.postgresql.org/pgsql-patches/2008-06/msg00026.php ?
I'll be glad to fix it.
On Fri, 2008-07-25 at 14:29 -0700, Joshua D. Drake wrote:
- users could make partial dumps and be confused and lose data.
Yes, but they can already do that with -n, -t, and the new
pre-data
and post-data switches. This is one more case where the
default is
a full dump but you one can specificly request less.
No they actually can't. You are guaranteed that regardless of a -n or
-t
flag that the data you receive is consistent. You can't guarantee that
with -w because you could pull different data based on an arbitrary
conditional that can not apply to all objects.
But are you guaranteed that you have all tables in FK relationships? No.
(But I like that capability also - its useful).
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
"Joshua D. Drake" <jd@commandprompt.com> writes:
How do we deal with this?
pg_dump -w "last_update_timestamp < ..." -t 'table*'
What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done. The only way to deal with the above
is:1. Wildcards aren't allowed if you have -w
2. You dump everything, if the WHERE clause isn't relevant you just dump
the whole table
There's always
3. Apply the WHERE clause to all tables and if there's a table missing
columns referenced in the where clause then fail with the appropriate
error.
Which seems like the right option to me. The tricky bit would be how to deal
with cases where you want a different where clause for different tables. But
even if it doesn't handle all cases that doesn't mean a partial solution is
unreasonable.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning
On Fri, Jul 25, 2008 at 11:17:20PM +0100, Gregory Stark wrote:
"Joshua D. Drake" <jd@commandprompt.com> writes:
How do we deal with this?
pg_dump -w "last_update_timestamp < ..." -t 'table*'
What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done. The only way to deal with the above
is:1. Wildcards aren't allowed if you have -w
2. You dump everything, if the WHERE clause isn't relevant you just dump
the whole tableThere's always
3. Apply the WHERE clause to all tables and if there's a table missing
columns referenced in the where clause then fail with the appropriate
error.Which seems like the right option to me. The tricky bit would be how to deal
with cases where you want a different where clause for different tables. But
even if it doesn't handle all cases that doesn't mean a partial solution is
unreasonable.
Actually, Davy's patch does deal with the case "where you want a different
where clause for different tables".
-dg
--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.
On Fri, 2008-07-25 at 12:38 -0700, Joshua D. Drake wrote:
Gained. Code complexity.
Hardly, patch is very small. I would recognise that as a factor
otherwise.
What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done.
I agree on the backup side, but then who would extract just a portion of
their data for backup? It would be no backup at all.
If you did use this as part of an incremental backup scheme, then they
would have to test it (just like any backup method). Incremental backups
rarely have self-consistency except as part of a greater whole.
As a dev tool it makes sense.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
On Fri, 2008-07-25 at 12:38 -0700, Joshua D. Drake wrote:
Gained. Code complexity.
Hardly, patch is very small. I would recognise that as a factor
otherwise.What I see is a recipe for inconsistent, un-restorable backups without a
user realizing what they have done.I agree on the backup side, but then who would extract just a portion of
their data for backup? It would be no backup at all.If you did use this as part of an incremental backup scheme, then they
would have to test it (just like any backup method). Incremental backups
rarely have self-consistency except as part of a greater whole.As a dev tool it makes sense.
I think we have yet another case for moving the core bits of pg_dump
into a library that can then be used by lots of clients. Until we do
that we're going to get continual pressure to add extra cases to pg_dump
unrelated to its principal functionality.
cheers
andrew
On Sat, 2008-07-26 at 07:47 -0400, Andrew Dunstan wrote:
Simon Riggs wrote:
As a dev tool it makes sense.
I think we have yet another case for moving the core bits of pg_dump
into a library that can then be used by lots of clients. Until we do
that we're going to get continual pressure to add extra cases to pg_dump
unrelated to its principal functionality.
That's a good idea and I support that.
I'm slightly suprised at the "principal functionality" bit. In a world
where PITR exists the role and importance of pg_dump has waned
considerably. What *is* its principal function? Does it have just one?
One man's dev system is another man's data warehouse, or another man's
backup. The meaning of a dump is defined by the user making the data
dump, not the tool used.
Is this one option sufficient to make us invent pg_make_dev_database?
(With all pg_dump options, plus -w). If that's what we need, fine by me.
I'm always interested in the capability not the structure/naming.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
In a world
where PITR exists the role and importance of pg_dump has waned
considerably. What *is* its principal function? Does it have just one?
I think that's probably a rather narrow perspective.
PITR doesn't work across versions or architectures or OSes. And if
you're using it for failover, then using it for standalone backups as
well means you will need a custom archive_command which can be a bit
tricky to get right. And a custom dump is almost always far smaller than
a PITR dump, even when it's compressed.
I suspect that the vast majority of our users are still using pg_dump to
make normal backups, and that it works quite happily for them. It's
really only when databases get pretty large that this becomes
unmanageable. I think using pg_dump for backups and PITR for failover is
a good combination for a great many users.
So, IMNSHO, making a full database backup is still pg_dump's principal
function.
cheers
andrew
On Sat, 2008-07-26 at 09:08 -0400, Andrew Dunstan wrote:
So, IMNSHO, making a full database backup is still pg_dump's principal
function.
Making copies for development databases is also a common use case, and
if not more common than backups, at least not far behind. This was my
stated use case.
From my perspective, this should be fairly simple
* do we agree the use case is a problem we care about?
* do we agree the proposal would help that use case?
* whats the best way to package that feature?
If we wish to protect pg_dump's role, then lets have another utility or
some packaging that can be used for its other hidden roles. That sounds
like we might all agree on that. pg_dev_dump? How should it look?
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
If we wish to protect pg_dump's role, then lets have another utility or
some packaging that can be used for its other hidden roles. That sounds
like we might all agree on that. pg_dev_dump? How should it look?
Actually, if we libraryise pg_dump and add some corresponding \ commands
to psql, then this would possibly be unnecessary .
cheers
andrew