pg_dump --split patch
Dear fellow hackers,
Problem: A normal diff of two slightly different schema dump files (pg_dump
-s), will not produce a user-friendly diff, as you get all changes in the
same file.
Solution: I propose a new option to pg_dump, --split, which dumps each
object to a separate file in a user friendly directory structure:
[-f filename] : main dump file, imports each splitted part using \i
[-f filename]-split/[desc]/[tag]/[oid].sql : dump of the oid
Example: If the filename (-f) is "pg.dump", the following directory
structure would be created:
$ ./pg_dump -f /crypt/pg.dump --split -F p -s glue
/crypt/pg.dump-split/VIEW/
/crypt/pg.dump-split/TYPE/
/crypt/pg.dump-split/TRIGGER/
/crypt/pg.dump-split/TABLE/
/crypt/pg.dump-split/SEQUENCE/
/crypt/pg.dump-split/SCHEMA/
/crypt/pg.dump-split/PROCEDURAL_LANGUAGE/
/crypt/pg.dump-split/INDEX/
/crypt/pg.dump-split/FUNCTION/
/crypt/pg.dump-split/FK_CONSTRAINT/
/crypt/pg.dump-split/CONSTRAINT/
/crypt/pg.dump-split/AGGREGATE/
In each such directory, one directory per object name is created.
If we would have a function "foobar" with oid "12345" it would be saved to:
/crypt/pg.dump-split/FUNCTION/foobar/12345.sql
In the "pg.dump" plain text file, the files are "linked in" using the "\i"
psql command, e.g.:
\i /crypt/pg.dump-split/FUNCTION/foobar/12345.sql
Potential use-case scenarios:
*) Version control your database schema, by exporting it daily (using
--split) and commiting the differences.
*) Compare differences of schema dumps created in different points in time.
Since objects are stored in separate files, it is easier to see what areas
were modified, compared to looking at the diff of two entire schemas.
*) Restore only some objects, based on type (e.g., only the functions) or
name (e.g. only fucntions of certain name/names).
I've tested the patch for both the latest HEAD (9.1devel) as well as 8.4.6.
Feedback welcome.
--
Best regards,
Joel Jacobson
Glue Finance
Attachments:
pg-dump-split-plain-text-files-8.4.6.patchapplication/octet-stream; name=pg-dump-split-plain-text-files-8.4.6.patchDownload
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_backup_archiver.c postgresql-8.4.6-pg-dump-split-plain-text-files/src/bin/pg_dump/pg_backup_archiver.c
*** postgresql-8.4.6/src/bin/pg_dump/pg_backup_archiver.c 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split-plain-text-files/src/bin/pg_dump/pg_backup_archiver.c 2010-12-28 15:49:39.510968492 +0100
***************
*** 27,32 ****
--- 27,33 ----
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
+ #include <sys/stat.h>
#ifdef WIN32
#include <io.h>
***************
*** 2752,2757 ****
--- 2753,2828 ----
return;
}
+ /*
+ * Split object into separate file,
+ * if the --split option is enabled and,
+ * if the object has an oid.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ char splitFilename[1024];
+ char *tag;
+ char *tagArgPos;
+ char *desc;
+ char *descSpacePos;
+ mode_t omode;
+
+ /*
+ * Strip eventual argument part from "tag" (e.g. the name of functions)
+ * Example: "foobar(_arg1 int, _arg2 int)" --> "foobar"
+ */
+ tagArgPos = strstr(te->tag,"(");
+ if (tagArgPos == NULL)
+ tag = strdup(te->tag);
+ else
+ tag = strndup(te->tag, tagArgPos - te->tag);
+
+ desc = strdup(te->desc);
+ descSpacePos = strstr(desc," ");
+ /*
+ * Replace " " with "_" in "desc"
+ * Example: "FK CONSTRAINT" --> "FK_CONSTRAINT"
+ */
+ while ((descSpacePos = strstr(desc, " ")) != NULL)
+ {
+ char *dup = strdup(desc);
+ strlcpy(desc, dup, descSpacePos - desc + 1);
+ strcat(desc, "_");
+ strcat(desc, dup + (descSpacePos - desc) + strlen("_"));
+ free(dup);
+ }
+
+ /*
+ * Build path consisting of [filename]-split/[desc]/[tag]/[oid].sql
+ * Create the directories
+ * Example: dumpfile-split/FUNCTION/foobar/12345.sql
+ */
+ omode = S_IRWXU | S_IRWXG | S_IRWXO;
+ snprintf(splitFilename, 1024, "%s-split", ropt->filename);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s", ropt->filename, desc);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s", ropt->filename, desc, tag);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%d.sql", ropt->filename, desc, tag, te->catalogId.oid);
+
+ /* Add \i <split file name> to main dump file */
+ ahprintf(AH, "\\i %s\n", splitFilename);
+
+ /*
+ * Close the normal file handle to which non-splittable
+ * objects are written.
+ *
+ * Open split file handle for splitFilename.
+ *
+ * In the end of the function,
+ * the split file handle will be closed, and
+ * the normal file handle will be reopened again.
+ */
+ fclose(AH->OF);
+ AH->OF = fopen(splitFilename, PG_BINARY_A);
+ }
+
/* Select owner, schema, and tablespace as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
***************
*** 2881,2886 ****
--- 2952,2967 ----
free(AH->currUser);
AH->currUser = NULL;
}
+
+ /*
+ * If we are using the --split option,
+ * close the split file handle, and reopen the normal file handle.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ fclose(AH->OF);
+ AH->OF = fopen(ropt->filename, PG_BINARY_A);
+ }
}
void
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_backup.h postgresql-8.4.6-pg-dump-split-plain-text-files/src/bin/pg_dump/pg_backup.h
*** postgresql-8.4.6/src/bin/pg_dump/pg_backup.h 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split-plain-text-files/src/bin/pg_dump/pg_backup.h 2010-12-28 15:49:39.510968492 +0100
***************
*** 142,147 ****
--- 142,148 ----
int number_of_jobs;
bool *idWanted; /* array showing which dump IDs to emit */
+ int split_files; /* --split option, split objects into separate files */
} RestoreOptions;
/*
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_dump.c postgresql-8.4.6-pg-dump-split-plain-text-files/src/bin/pg_dump/pg_dump.c
*** postgresql-8.4.6/src/bin/pg_dump/pg_dump.c 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split-plain-text-files/src/bin/pg_dump/pg_dump.c 2010-12-28 15:49:39.520872802 +0100
***************
*** 111,116 ****
--- 111,117 ----
static int disable_dollar_quoting = 0;
static int dump_inserts = 0;
static int column_inserts = 0;
+ static int split_files = 0;
static void help(const char *progname);
***************
*** 277,282 ****
--- 278,284 ----
{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
{"role", required_argument, NULL, 3},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
+ {"split", no_argument, &split_files, 1},
{NULL, 0, NULL, 0}
};
***************
*** 765,770 ****
--- 767,773 ----
ropt->disable_triggers = disable_triggers;
ropt->use_setsessauth = use_setsessauth;
ropt->dataOnly = dataOnly;
+ ropt->split_files = split_files;
if (compressLevel == -1)
ropt->compression = 0;
***************
*** 797,802 ****
--- 800,806 ----
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n"));
+ printf(_(" --split split objects into separate plain text files\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
pg-dump-split-plain-text-files-9.1devel.patchapplication/octet-stream; name=pg-dump-split-plain-text-files-9.1devel.patchDownload
*** a/src/bin/pg_dump/pg_backup.h
--- b/src/bin/pg_dump/pg_backup.h
***************
*** 143,148 **** typedef struct _restoreOptions
--- 143,149 ----
int number_of_jobs;
bool *idWanted; /* array showing which dump IDs to emit */
+ int split_files; /* --split option, split objects into separate files */
} RestoreOptions;
/*
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 27,32 ****
--- 27,33 ----
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
+ #include <sys/stat.h>
#ifdef WIN32
#include <io.h>
***************
*** 2830,2835 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 2831,2906 ----
return;
}
+ /*
+ * Split object into separate file,
+ * if the --split option is enabled and,
+ * if the object has an oid.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ char splitFilename[1024];
+ char *tag;
+ char *tagArgPos;
+ char *desc;
+ char *descSpacePos;
+ mode_t omode;
+
+ /*
+ * Strip eventual argument part from "tag" (e.g. the name of functions)
+ * Example: "foobar(_arg1 int, _arg2 int)" --> "foobar"
+ */
+ tagArgPos = strstr(te->tag,"(");
+ if (tagArgPos == NULL)
+ tag = strdup(te->tag);
+ else
+ tag = strndup(te->tag, tagArgPos - te->tag);
+
+ desc = strdup(te->desc);
+ descSpacePos = strstr(desc," ");
+ /*
+ * Replace " " with "_" in "desc"
+ * Example: "FK CONSTRAINT" --> "FK_CONSTRAINT"
+ */
+ while ((descSpacePos = strstr(desc, " ")) != NULL)
+ {
+ char *dup = strdup(desc);
+ strlcpy(desc, dup, descSpacePos - desc + 1);
+ strcat(desc, "_");
+ strcat(desc, dup + (descSpacePos - desc) + strlen("_"));
+ free(dup);
+ }
+
+ /*
+ * Build path consisting of [filename]-split/[desc]/[tag]/[oid].sql
+ * Create the directories
+ * Example: dumpfile-split/FUNCTION/foobar/12345.sql
+ */
+ omode = S_IRWXU | S_IRWXG | S_IRWXO;
+ snprintf(splitFilename, 1024, "%s-split", ropt->filename);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s", ropt->filename, desc);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s", ropt->filename, desc, tag);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%d.sql", ropt->filename, desc, tag, te->catalogId.oid);
+
+ /* Add \i <split file name> to main dump file */
+ ahprintf(AH, "\\i %s\n", splitFilename);
+
+ /*
+ * Close the normal file handle to which non-splittable
+ * objects are written.
+ *
+ * Open split file handle for splitFilename.
+ *
+ * In the end of the function,
+ * the split file handle will be closed, and
+ * the normal file handle will be reopened again.
+ */
+ fclose(AH->OF);
+ AH->OF = fopen(splitFilename, PG_BINARY_A);
+ }
+
/* Select owner, schema, and tablespace as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
***************
*** 2960,2965 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 3031,3046 ----
free(AH->currUser);
AH->currUser = NULL;
}
+
+ /*
+ * If we are using the --split option,
+ * close the split file handle, and reopen the normal file handle.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ fclose(AH->OF);
+ AH->OF = fopen(ropt->filename, PG_BINARY_A);
+ }
}
void
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 134,139 **** static int disable_dollar_quoting = 0;
--- 134,140 ----
static int dump_inserts = 0;
static int column_inserts = 0;
static int no_security_label = 0;
+ static int split_files = 0;
static void help(const char *progname);
***************
*** 316,321 **** main(int argc, char **argv)
--- 317,323 ----
{"role", required_argument, NULL, 3},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
{"no-security-label", no_argument, &no_security_label, 1},
+ {"split", no_argument, &split_files, 1},
{NULL, 0, NULL, 0}
};
***************
*** 804,809 **** main(int argc, char **argv)
--- 806,812 ----
ropt->disable_triggers = disable_triggers;
ropt->use_setsessauth = use_setsessauth;
ropt->dataOnly = dataOnly;
+ ropt->split_files = split_files;
if (compressLevel == -1)
ropt->compression = 0;
***************
*** 836,841 **** help(const char *progname)
--- 839,845 ----
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n"));
+ printf(_(" --split split objects into separate plain text files\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
Joel Jacobson <joel@gluefinance.com> writes:
Dear fellow hackers,
Problem: A normal diff of two slightly different schema dump files (pg_dump
-s), will not produce a user-friendly diff, as you get all changes in the
same file.
Solution: I propose a new option to pg_dump, --split, which dumps each
object to a separate file in a user friendly directory structure:
Um ... how does that solve the claimed problem exactly?
[-f filename] : main dump file, imports each splitted part using \i
[-f filename]-split/[desc]/[tag]/[oid].sql : dump of the oid
This particular choice seems remarkably *un* friendly, since two dumps
from different DBs will inevitably not share the same OIDs, making it
practically impossible to compare them even if they are logically
identical. But even without the choice to use OIDs in the filenames
I'm unconvinced that file-per-object is a good idea in any way shape or
form.
regards, tom lane
2010/12/28 Tom Lane <tgl@sss.pgh.pa.us>
Joel Jacobson <joel@gluefinance.com> writes:
Dear fellow hackers,
Problem: A normal diff of two slightly different schema dump files(pg_dump
-s), will not produce a user-friendly diff, as you get all changes in the
same file.Solution: I propose a new option to pg_dump, --split, which dumps each
object to a separate file in a user friendly directory structure:Um ... how does that solve the claimed problem exactly?
Because then you can do,
$ diff -r <old schema dump dir> <new schema dump dir>,
instead of,
$ diff <old entire schema dump> <new entire schema dump>
which will nicely reveal each individual object modified, as opposed to a
huge global diff of everything
[-f filename] : main dump file, imports each splitted part using \i
[-f filename]-split/[desc]/[tag]/[oid].sql : dump of the oidThis particular choice seems remarkably *un* friendly, since two dumps
from different DBs will inevitably not share the same OIDs, making it
practically impossible to compare them even if they are logically
identical. But even without the choice to use OIDs in the filenames
I'm unconvinced that file-per-object is a good idea in any way shape or
form.
Good point!
To compare two different database, perhaps it's possible to use a sequence,
1,2,...,n for each file in each directory, i.e., /[desc]/[tag]/[n], and to
sort them by something distinct which will ensure the same numbering between
different databases, such as the arguments for functions, or other
properties for other kind of objects. Any ideas?
(In my case, I didn't need to compare schemas between different database. I
needed to compare two dumps created at different points in time of the same
database, which do share the same oids for objects existing in both
versions.)
regards, tom lane
--
Best regards,
Joel Jacobson
Glue Finance
E: jj@gluefinance.com
T: +46 70 360 38 01
Postal address:
Glue Finance AB
Box 549
114 11 Stockholm
Sweden
Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden
Joel Jacobson <joel@gluefinance.com> writes:
2010/12/28 Tom Lane <tgl@sss.pgh.pa.us>
Joel Jacobson <joel@gluefinance.com> writes:
Solution: I propose a new option to pg_dump, --split, which dumps each
object to a separate file in a user friendly directory structure:Um ... how does that solve the claimed problem exactly?
Because then you can do,
$ diff -r <old schema dump dir> <new schema dump dir>,
instead of,
$ diff <old entire schema dump> <new entire schema dump>
which will nicely reveal each individual object modified, as opposed to a
huge global diff of everything
That has at least as many failure modes as the other representation.
regards, tom lane
2010/12/28 Tom Lane <tgl@sss.pgh.pa.us>
That has at least as many failure modes as the other representation.
I don't follow, what do you mean with "failure modes"? The oid in the
filename? I suggested to use a sequence instead but you didn't comment on
that. Are there any other failure modes which could cause a diff -r between
two different databases to break?
(This might be a bad idea for some other reason, but I noticed a few other
users requesting the same feature when I googled "pg_dump split".)
--
Best regards,
Joel Jacobson
Glue Finance
On Tue, Dec 28, 2010 at 11:59 AM, Joel Jacobson <joel@gluefinance.com> wrote:
I don't follow, what do you mean with "failure modes"? The oid in the
filename? I suggested to use a sequence instead but you didn't comment on
that. Are there any other failure modes which could cause a diff -r between
two different databases to break?
Both OID and sequence mean that your likely to get a diff which is
nothing more than complete files removed from 1 side and added to the
othe rside with different names (i.e. oid's don't match, or an
added/removed object changes all following sequence assingments).
If you're going to try and split, I really think the only usefull
filename has to be similar to something like:
<schema>/<type>/<name>/<part>
If you want to use "diff", you pretty much have to make sure that the
*path* will be identical for similary named objects, irrespective of
anything else in the database. And path has to be encoding aware.
And you want names that glob well, so for instance, you could exclude
*.data (or a schema) from the diff.
a.
--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.
On 12/28/2010 11:59 AM, Joel Jacobson wrote:
2010/12/28 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>
That has at least as many failure modes as the other representation.
I don't follow, what do you mean with "failure modes"? The oid in the
filename? I suggested to use a sequence instead but you didn't comment
on that. Are there any other failure modes which could cause a diff -r
between two different databases to break?(This might be a bad idea for some other reason, but I noticed a few
other users requesting the same feature when I googled "pg_dump split".)
A better approach to the problem might be to have a tool which did a
comparison of structures rather than a textual comparison of dumps. For
extra credit, such a tool might even try to produce a sync script for
you ...
Of course, that task might involve more effort than you want to devote
to it.
cheers
andrew
Joel Jacobson <joel@gluefinance.com> writes:
2010/12/28 Tom Lane <tgl@sss.pgh.pa.us>
That has at least as many failure modes as the other representation.
I don't follow, what do you mean with "failure modes"? The oid in the
filename? I suggested to use a sequence instead but you didn't comment on
that. Are there any other failure modes which could cause a diff -r between
two different databases to break?
AFAIK the primary failure modes for diff'ing text dumps are
(1) randomly different ordering of objects from one dump to another.
Your initial proposal would avoid that problem as long as the object
OIDs didn't change, but since it falls down completely across a dump and
reload, or delete and recreate, I can't really see that it's a step
forward. Using a sequence number generated by pg_dump doesn't change
this at all --- the sequence would be just as unpredictable.
(2) randomly different ordering of rows within a table. Your patch
didn't address that, unless I misunderstood quite a bit.
I think the correct fix for (1) is to improve pg_dump's method for
sorting objects. It's not that bad now, but it does have issues with
random ordering of similarly-named objects. IIRC Peter Eisentraut
proposed something for this last winter but it seemed a mite too ugly,
and he got beaten down to just this:
commit 1acc06a1f4ae752793d2199d8d462a6708c8acc2
Author: Peter Eisentraut <peter_e@gmx.net>
Date: Mon Feb 15 19:59:47 2010 +0000
When sorting functions in pg_dump, break ties (same name) by number of argum
ents
Maybe you can do better, but I'd suggest going back to reread the
discussion that preceded that patch.
(This might be a bad idea for some other reason, but I noticed a few other
users requesting the same feature when I googled "pg_dump split".)
AFAIR what those folk really wanted was a selective dump with more
selectivity knobs than exist now. I don't think their lives would be
improved by having to root through a twisty little maze of numbered
files to find the object they wanted.
regards, tom lane
On Tue, Dec 28, 2010 at 11:00 AM, Joel Jacobson <joel@gluefinance.com>wrote:
Dear fellow hackers,
Problem: A normal diff of two slightly different schema dump files (pg_dump
-s), will not produce a user-friendly diff, as you get all changes in the
same file.Solution: I propose a new option to pg_dump, --split, which dumps each
object to a separate file in a user friendly directory structure:[-f filename] : main dump file, imports each splitted part using \i
[-f filename]-split/[desc]/[tag]/[oid].sql : dump of the oidExample: If the filename (-f) is "pg.dump", the following directory
structure would be created:$ ./pg_dump -f /crypt/pg.dump --split -F p -s glue
/crypt/pg.dump-split/VIEW/
/crypt/pg.dump-split/TYPE/
/crypt/pg.dump-split/TRIGGER/
/crypt/pg.dump-split/TABLE/
/crypt/pg.dump-split/SEQUENCE/
/crypt/pg.dump-split/SCHEMA/
/crypt/pg.dump-split/PROCEDURAL_LANGUAGE/
/crypt/pg.dump-split/INDEX/
/crypt/pg.dump-split/FUNCTION/
/crypt/pg.dump-split/FK_CONSTRAINT/
/crypt/pg.dump-split/CONSTRAINT/
/crypt/pg.dump-split/AGGREGATE/In each such directory, one directory per object name is created.
If we would have a function "foobar" with oid "12345" it would be saved to:
/crypt/pg.dump-split/FUNCTION/foobar/12345.sqlIn the "pg.dump" plain text file, the files are "linked in" using the "\i"
psql command, e.g.:
\i /crypt/pg.dump-split/FUNCTION/foobar/12345.sqlPotential use-case scenarios:
*) Version control your database schema, by exporting it daily (using
--split) and commiting the differences.*) Compare differences of schema dumps created in different points in time.
Since objects are stored in separate files, it is easier to see what areas
were modified, compared to looking at the diff of two entire schemas.*) Restore only some objects, based on type (e.g., only the functions) or
name (e.g. only fucntions of certain name/names).I've tested the patch for both the latest HEAD (9.1devel) as well as 8.4.6.
I would suggest the directory structure as:
/crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
...
/crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sql
This might n be more amenable to diff'ing the different dumps. Schemas are
logical grouping of other objects and hence making that apparent in your
dump's hierarchy makes more sense.
Most importantly, as Tom suggested, don't use or rely on OIDs. I think
function overloading is the only case where you can have more than one
object with the same name under a schema. That can be resolved if you
included function signature in filename:
/crypt/pg.dump-split/emp/FUNCTIONS/myfunc-int-char.sql
/crypt/pg.dump-split/emp/FUNCTIONS/myfunc-int-int.sql
/crypt/pg.dump-split/emp/FUNCTIONS/myfunc-int.sql
Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
2010/12/28 Gurjeet Singh <singh.gurjeet@gmail.com>
I would suggest the directory structure as:
/crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
...
/crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sqlThis might n be more amenable to diff'ing the different dumps. Schemas are
logical grouping of other objects and hence making that apparent in your
dump's hierarchy makes more sense.
Thanks Gurjeet and Tom for good feedback!
I've made some changes and attached new patches.
Looks much better now I think!
This is what I've changed,
*) Not using oid anymore in the filename
*) New filename/path structure: [-f
filename]-split/[schema]/[desc]/[tag].sql
*) If two objects share the same name tag for the same [schema]/[desc], -2,
-3, etc is appended to the name. Example:
~/pg.dump-split/public/FUNCTION/foobar.sql
~/pg.dump-split/public/FUNCTION/foobar-2.sql
~/pg.dump-split/public/FUNCTION/barfoo.sql
~/pg.dump-split/public/FUNCTION/barfoo-2.sql
~/pg.dump-split/public/FUNCTION/barfoo-3.sql
I think you are right about functions (and aggregates) being the only
desc-type where two objects can share the same name in the same schema.
This means the problem of dumping objects in different order is a very
limited problem, only affecting overloaded functions.
I didn't include the arguments in the file name, as it would lead to very
long file names unless truncated, and since the problem is very limited, I
think we shouldn't include it. It's cleaner with just the name part of the
tag in the file name.
--
Best regards,
Joel Jacobson
Glue Finance
Attachments:
pg-dump-split-plain-text-files-8.4.6.patchapplication/octet-stream; name=pg-dump-split-plain-text-files-8.4.6.patchDownload
Binary files postgresql-8.4.6/src/bin/pg_dump/._pg_backup_archiver.c and postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/._pg_backup_archiver.c differ
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_backup_archiver.c postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup_archiver.c
*** postgresql-8.4.6/src/bin/pg_dump/pg_backup_archiver.c 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup_archiver.c 2010-12-28 20:19:38.720926975 +0100
***************
*** 27,32 ****
--- 27,33 ----
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
+ #include <sys/stat.h>
#ifdef WIN32
#include <io.h>
***************
*** 2752,2757 ****
--- 2753,2845 ----
return;
}
+ /*
+ * Split object into separate file,
+ * if the --split option is enabled and,
+ * if the object has an oid.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ char splitFilename[1024];
+ char *tag;
+ char *tagArgPos;
+ char *desc;
+ char *descSpacePos;
+ struct stat stat_buf;
+ int filenameSeq;
+ mode_t omode;
+
+ /*
+ * Strip eventual argument part from "tag" (e.g. the name of functions)
+ * Example: "foobar(_arg1 int, _arg2 int)" --> "foobar"
+ */
+ tagArgPos = strstr(te->tag,"(");
+ if (tagArgPos == NULL)
+ tag = strdup(te->tag);
+ else
+ tag = strndup(te->tag, tagArgPos - te->tag);
+
+ desc = strdup(te->desc);
+ descSpacePos = strstr(desc," ");
+ /*
+ * Replace " " with "_" in "desc"
+ * Example: "FK CONSTRAINT" --> "FK_CONSTRAINT"
+ */
+ while ((descSpacePos = strstr(desc, " ")) != NULL)
+ {
+ char *dup = strdup(desc);
+ strlcpy(desc, dup, descSpacePos - desc + 1);
+ strcat(desc, "_");
+ strcat(desc, dup + (descSpacePos - desc) + strlen("_"));
+ free(dup);
+ }
+
+ /*
+ * Build path consisting of [filename]-split/[schema]/[desc]/[[tag]-n].sql
+ * Create the directories
+ *
+ * Example: dumpfile-split/public/FUNCTION/foobar.sql
+ *
+ * If there would be two functions named "foobar",
+ * with different arguments, the second one would be named:
+ * dumpfile-split/public/FUNCTION/foobar-2.sql
+ */
+ omode = S_IRWXU | S_IRWXG | S_IRWXO;
+ snprintf(splitFilename, 1024, "%s-split", ropt->filename);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s", ropt->filename, te->namespace);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s", ropt->filename, te->namespace, desc);
+ mkdir(splitFilename, omode);
+
+ /* Most objects have unique names (tag) for their type (desc)... */
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%s.sql", ropt->filename, te->namespace, desc, tag);
+ /*
+ * ...should any object share the same name for the same type,
+ * append -2, -3, etc to the filename
+ */
+ for (filenameSeq = 2; stat(splitFilename, &stat_buf) == 0; filenameSeq++)
+ {
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%s-%d.sql", ropt->filename, te->namespace, desc, tag, filenameSeq);
+ }
+
+ /* Add \i <split file name> to main dump file */
+ ahprintf(AH, "\\i %s\n", splitFilename);
+
+ /*
+ * Close the normal file handle to which non-splittable
+ * objects are written.
+ *
+ * Open split file handle for splitFilename.
+ *
+ * In the end of the function,
+ * the split file handle will be closed, and
+ * the normal file handle will be reopened again.
+ */
+ fclose(AH->OF);
+ AH->OF = fopen(splitFilename, PG_BINARY_A);
+ }
+
/* Select owner, schema, and tablespace as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
***************
*** 2881,2886 ****
--- 2969,2984 ----
free(AH->currUser);
AH->currUser = NULL;
}
+
+ /*
+ * If we are using the --split option,
+ * close the split file handle, and reopen the normal file handle.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ fclose(AH->OF);
+ AH->OF = fopen(ropt->filename, PG_BINARY_A);
+ }
}
void
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_backup.h postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup.h
*** postgresql-8.4.6/src/bin/pg_dump/pg_backup.h 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup.h 2010-12-28 20:01:30.490924249 +0100
***************
*** 142,147 ****
--- 142,148 ----
int number_of_jobs;
bool *idWanted; /* array showing which dump IDs to emit */
+ int split_files; /* --split option, split objects into separate files */
} RestoreOptions;
/*
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_dump.c postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_dump.c
*** postgresql-8.4.6/src/bin/pg_dump/pg_dump.c 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_dump.c 2010-12-28 20:01:30.490924249 +0100
***************
*** 111,116 ****
--- 111,117 ----
static int disable_dollar_quoting = 0;
static int dump_inserts = 0;
static int column_inserts = 0;
+ static int split_files = 0;
static void help(const char *progname);
***************
*** 277,282 ****
--- 278,284 ----
{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
{"role", required_argument, NULL, 3},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
+ {"split", no_argument, &split_files, 1},
{NULL, 0, NULL, 0}
};
***************
*** 765,770 ****
--- 767,773 ----
ropt->disable_triggers = disable_triggers;
ropt->use_setsessauth = use_setsessauth;
ropt->dataOnly = dataOnly;
+ ropt->split_files = split_files;
if (compressLevel == -1)
ropt->compression = 0;
***************
*** 797,802 ****
--- 800,806 ----
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n"));
+ printf(_(" --split split objects into separate plain text files\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
pg-dump-split-plain-text-files-9.1devel.patchapplication/octet-stream; name=pg-dump-split-plain-text-files-9.1devel.patchDownload
*** a/src/bin/pg_dump/pg_backup.h
--- b/src/bin/pg_dump/pg_backup.h
***************
*** 143,148 **** typedef struct _restoreOptions
--- 143,149 ----
int number_of_jobs;
bool *idWanted; /* array showing which dump IDs to emit */
+ int split_files; /* --split option, split objects into separate files */
} RestoreOptions;
/*
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 27,32 ****
--- 27,33 ----
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
+ #include <sys/stat.h>
#ifdef WIN32
#include <io.h>
***************
*** 2830,2835 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 2831,2923 ----
return;
}
+ /*
+ * Split object into separate file,
+ * if the --split option is enabled and,
+ * if the object has an oid.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ char splitFilename[1024];
+ char *tag;
+ char *tagArgPos;
+ char *desc;
+ char *descSpacePos;
+ struct stat stat_buf;
+ int filenameSeq;
+ mode_t omode;
+
+ /*
+ * Strip eventual argument part from "tag" (e.g. the name of functions)
+ * Example: "foobar(_arg1 int, _arg2 int)" --> "foobar"
+ */
+ tagArgPos = strstr(te->tag,"(");
+ if (tagArgPos == NULL)
+ tag = strdup(te->tag);
+ else
+ tag = strndup(te->tag, tagArgPos - te->tag);
+
+ desc = strdup(te->desc);
+ descSpacePos = strstr(desc," ");
+ /*
+ * Replace " " with "_" in "desc"
+ * Example: "FK CONSTRAINT" --> "FK_CONSTRAINT"
+ */
+ while ((descSpacePos = strstr(desc, " ")) != NULL)
+ {
+ char *dup = strdup(desc);
+ strlcpy(desc, dup, descSpacePos - desc + 1);
+ strcat(desc, "_");
+ strcat(desc, dup + (descSpacePos - desc) + strlen("_"));
+ free(dup);
+ }
+
+ /*
+ * Build path consisting of [filename]-split/[schema]/[desc]/[[tag]-n].sql
+ * Create the directories
+ *
+ * Example: dumpfile-split/public/FUNCTION/foobar.sql
+ *
+ * If there would be two functions named "foobar",
+ * with different arguments, the second one would be named:
+ * dumpfile-split/public/FUNCTION/foobar-2.sql
+ */
+ omode = S_IRWXU | S_IRWXG | S_IRWXO;
+ snprintf(splitFilename, 1024, "%s-split", ropt->filename);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s", ropt->filename, te->namespace);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s", ropt->filename, te->namespace, desc);
+ mkdir(splitFilename, omode);
+
+ /* Most objects have unique names (tag) for their type (desc)... */
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%s.sql", ropt->filename, te->namespace, desc, tag);
+ /*
+ * ...should any object share the same name for the same type,
+ * append -2, -3, etc to the filename
+ */
+ for (filenameSeq = 2; stat(splitFilename, &stat_buf) == 0; filenameSeq++)
+ {
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%s-%d.sql", ropt->filename, te->namespace, desc, tag, filenameSeq);
+ }
+
+ /* Add \i <split file name> to main dump file */
+ ahprintf(AH, "\\i %s\n", splitFilename);
+
+ /*
+ * Close the normal file handle to which non-splittable
+ * objects are written.
+ *
+ * Open split file handle for splitFilename.
+ *
+ * In the end of the function,
+ * the split file handle will be closed, and
+ * the normal file handle will be reopened again.
+ */
+ fclose(AH->OF);
+ AH->OF = fopen(splitFilename, PG_BINARY_A);
+ }
+
/* Select owner, schema, and tablespace as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
***************
*** 2960,2965 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 3048,3063 ----
free(AH->currUser);
AH->currUser = NULL;
}
+
+ /*
+ * If we are using the --split option,
+ * close the split file handle, and reopen the normal file handle.
+ */
+ if (ropt->split_files && te->catalogId.oid)
+ {
+ fclose(AH->OF);
+ AH->OF = fopen(ropt->filename, PG_BINARY_A);
+ }
}
void
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 134,139 **** static int disable_dollar_quoting = 0;
--- 134,140 ----
static int dump_inserts = 0;
static int column_inserts = 0;
static int no_security_label = 0;
+ static int split_files = 0;
static void help(const char *progname);
***************
*** 316,321 **** main(int argc, char **argv)
--- 317,323 ----
{"role", required_argument, NULL, 3},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
{"no-security-label", no_argument, &no_security_label, 1},
+ {"split", no_argument, &split_files, 1},
{NULL, 0, NULL, 0}
};
***************
*** 804,809 **** main(int argc, char **argv)
--- 806,812 ----
ropt->disable_triggers = disable_triggers;
ropt->use_setsessauth = use_setsessauth;
ropt->dataOnly = dataOnly;
+ ropt->split_files = split_files;
if (compressLevel == -1)
ropt->compression = 0;
***************
*** 836,841 **** help(const char *progname)
--- 839,845 ----
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n"));
+ printf(_(" --split split objects into separate plain text files\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
On Tue, Dec 28, 2010 at 2:39 PM, Joel Jacobson <joel@gluefinance.com> wrote:
2010/12/28 Gurjeet Singh <singh.gurjeet@gmail.com>
I would suggest the directory structure as:
/crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
...
/crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sqlThis might n be more amenable to diff'ing the different dumps. Schemas are
logical grouping of other objects and hence making that apparent in your
dump's hierarchy makes more sense.Thanks Gurjeet and Tom for good feedback!
I've made some changes and attached new patches.
Looks much better now I think!This is what I've changed,
*) Not using oid anymore in the filename
*) New filename/path structure: [-f
filename]-split/[schema]/[desc]/[tag].sql
*) If two objects share the same name tag for the same [schema]/[desc], -2,
-3, etc is appended to the name. Example:
~/pg.dump-split/public/FUNCTION/foobar.sql
~/pg.dump-split/public/FUNCTION/foobar-2.sql
~/pg.dump-split/public/FUNCTION/barfoo.sql
~/pg.dump-split/public/FUNCTION/barfoo-2.sql
~/pg.dump-split/public/FUNCTION/barfoo-3.sqlI think you are right about functions (and aggregates) being the only
desc-type where two objects can share the same name in the same schema.
This means the problem of dumping objects in different order is a very
limited problem, only affecting overloaded functions.I didn't include the arguments in the file name, as it would lead to very
long file names unless truncated, and since the problem is very limited, I
think we shouldn't include it. It's cleaner with just the name part of the
tag in the file name.
I haven't seen your code yet, but we need to make sure that in case of name
collision we emit the object definitions in a sorted order so that the dump
is always deterministic: func1(char) should be _always_ dumped before
func1(int), that is, output file names are always deterministic.
The problem I see with suffixing a sequence id to the objects with name
collision is that one day the dump may name myfunc(int) as myfunc.sql and
after an overloaded version is created, say myfunc(char, int), then the same
myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.
Also, it is a project policy that we do not introduce new features in back
branches, so spending time on an 8.4.6 patch may not be the best use of your
time.
Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
Sent from my iPhone
On 28 dec 2010, at 21:45, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
The problem I see with suffixing a sequence id to the objects with name
collision is that one day the dump may name myfunc(int) as myfunc.sql and
after an overloaded version is created, say myfunc(char, int), then the same
myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.
I agree, good point!
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc to
reduce the need of truncating filenames.
Also, it is a project policy that we do not introduce new features in back
branches, so spending time on an 8.4.6 patch may not be the best use of your
time.
My company is using 8.4 and needs this feature, so I'll have to patch it
anyway :)
Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
On 12/28/2010 04:44 PM, Joel Jacobson wrote:
The problem I see with suffixing a sequence id to the objects with
name collision is that one day the dump may name myfunc(int) as
myfunc.sql and after an overloaded version is created, say
myfunc(char, int), then the same myfunc(int) may be dumped in
myfunc-2.sql, which again is non-deterministic.I agree, good point!
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc
to reduce the need of truncating filenames.
I think that's just horrible. Does the i stand for integer or inet? And
it will get *really* ugly for type names with spaces in them ...
cheers
andrew
2010/12/28 Andrew Dunstan <andrew@dunslane.net>
I think that's just horrible. Does the i stand for integer or inet? And it
will get *really* ugly for type names with spaces in them ...
True, true.
But while "c" is too short, I think "character varying" is too long. Is
there some convenient lookup table to convert between the long names to the
short names?
E.g.,
character varying => varchar
timestamp with time zone => timestamptz
etc.
--
Best regards,
Joel Jacobson
Glue Finance
Andrew Dunstan <andrew@dunslane.net> writes:
On 12/28/2010 04:44 PM, Joel Jacobson wrote:
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc
to reduce the need of truncating filenames.
I think that's just horrible. Does the i stand for integer or inet? And
it will get *really* ugly for type names with spaces in them ...
You think spaces are bad, try slashes ;-)
Not to mention the need for including schemas in typenames sometimes.
I think you're going to have a real problem trying to fully describe a
function's signature in a file name of reasonable max length.
regards, tom lane
On Tue, Dec 28, 2010 at 4:57 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 12/28/2010 04:44 PM, Joel Jacobson wrote:
The problem I see with suffixing a sequence id to the objects with name
collision is that one day the dump may name myfunc(int) as myfunc.sql and
after an overloaded version is created, say myfunc(char, int), then the same
myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.I agree, good point!
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc to
reduce the need of truncating filenames.I think that's just horrible. Does the i stand for integer or inet? And it
will get *really* ugly for type names with spaces in them ...
Do you mean using data type names in filename is a bad idea, or is
abbreviating the type names is a bad idea?
Maybe we can compute a hash based on the type names and use that in the
file's name?
Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
On Tue, Dec 28, 2010 at 2:39 PM, Joel Jacobson <joel@gluefinance.com> wrote:
I think you are right about functions (and aggregates) being the only
desc-type where two objects can share the same name in the same schema.
This means the problem of dumping objects in different order is a very
limited problem, only affecting overloaded functions.I didn't include the arguments in the file name, as it would lead to very
long file names unless truncated, and since the problem is very limited, I
think we shouldn't include it. It's cleaner with just the name part of the
tag in the file name.
Why not place all overloads of a function within the same file? Then,
assuming you order them deterministically within that file, we sidestep the
file naming issue and maintain useful diff capabilities, since a diff of the
function's file will show additions or removals of various overloaded
versions.
--
- David T. Wilson
david.t.wilson@gmail.com
2010/12/29 David Wilson <david.t.wilson@gmail.com>
Why not place all overloads of a function within the same file? Then,
assuming you order them deterministically within that file, we sidestep the
file naming issue and maintain useful diff capabilities, since a diff of the
function's file will show additions or removals of various overloaded
versions.
Good suggestion. I agree, trying to put variations of the same function in
different files simply becomes too ugly and the problem it solves is not big
enough.
Then we just need to make sure pg_dump dumps objects in the same order, and
let diff take care of the rest of the challenge. :)
Brb with a new patch.
--
Best regards,
Joel Jacobson
Glue Finance
David Wilson <david.t.wilson@gmail.com> writes:
On Tue, Dec 28, 2010 at 2:39 PM, Joel Jacobson <joel@gluefinance.com> wrote:
I didn't include the arguments in the file name, as it would lead to very
long file names unless truncated, and since the problem is very limited, I
think we shouldn't include it. It's cleaner with just the name part of the
tag in the file name.
Why not place all overloads of a function within the same file? Then,
assuming you order them deterministically within that file, we sidestep the
file naming issue and maintain useful diff capabilities, since a diff of the
function's file will show additions or removals of various overloaded
versions.
If you've solved the deterministic-ordering problem, then this entire
patch is quite useless. You can just run a normal dump and diff it.
regards, tom lane
2010/12/29 Tom Lane <tgl@sss.pgh.pa.us>
If you've solved the deterministic-ordering problem, then this entire
patch is quite useless. You can just run a normal dump and diff it.
No, that's only half true.
Diff will do a good job minimizing the "size" of the diff output, yes, but
such a diff is still quite useless if you want to quickly grasp the context
of the change.
If you have a hundreds of functions, just looking at the changed source code
is not enough to figure out which functions were modified, unless you have
the brain power to memorize every single line of code and are able to figure
out the function name just by looking at the old and new line of codes.
To understand a change to my database functions, I would start by looking at
the top-level, only focusing on the names of the functions
modified/added/removed.
At this stage, you want as little information as possible about each change,
such as only the names of the functions.
To do this, get a list of changes functions, you cannot compare two full
schema plain text dumps using diff, as it would only reveal the lines
changed, not the name of the functions, unless you are lucky to get the name
of the function within the (by default) 3 lines of copied context.
While you could increase the number of copied lines of context to a value
which would ensure you would see the name of the function in the diff, that
is not feasible if you want to quickly "get a picture" of the code areas
modified, since you would then need to read through even more lines of diff
output.
For a less database-centric system where you don't have hundreds of stored
procedures, I would agree it's not an issue to keep track of changes by
diffing entire schema files, but for extremely database-centric systems,
such as the one we have developed at my company, it's not possible to "get
the whole picture" of a change by analyzing diffs of entire schema dumps.
The patch has been updated:
*) Only spit objects with a namespace (schema) not being null
*) Append all objects of same tag (name) of same type (desc) of same
namespace (schema) to the same file (i.e., do not append -2, -3, like
before) (Suggested by David Wilson, thanks.)
I also tested to play around with "ORDER BY pronargs" and "ORDER BY pronargs
DESC" to the queries in getFuncs() in pg_dump.c, but it had no effect to the
order the functions of same name but different number of arguments were
dumped.
Perhaps functions are already sorted?
Anyway, it doesn't matter that much, keeping all functions of the same name
in the same file is a fair trade-off I think. The main advantage is the
ability to quickly get a picture of the names of all changed functions,
secondly to optimize the actual diff output.
--
Best regards,
Joel Jacobson
Glue Finance
E: jj@gluefinance.com
T: +46 70 360 38 01
Postal address:
Glue Finance AB
Box 549
114 11 Stockholm
Sweden
Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden
Attachments:
pg-dump-split-plain-text-files-9.1devel.patchapplication/octet-stream; name=pg-dump-split-plain-text-files-9.1devel.patchDownload
*** a/src/bin/pg_dump/pg_backup.h
--- b/src/bin/pg_dump/pg_backup.h
***************
*** 143,148 **** typedef struct _restoreOptions
--- 143,149 ----
int number_of_jobs;
bool *idWanted; /* array showing which dump IDs to emit */
+ int split_files; /* --split option, split objects into separate files */
} RestoreOptions;
/*
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 27,32 ****
--- 27,34 ----
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
+ #include <sys/stat.h>
+ #include <string.h>
#ifdef WIN32
#include <io.h>
***************
*** 2830,2835 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 2832,2910 ----
return;
}
+ /*
+ * Split object into separate file,
+ * if the --split option is enabled and,
+ * if the object has an oid and,
+ * if the object has a namespace.
+ */
+ if (ropt->split_files && te->catalogId.oid && te->namespace)
+ {
+ char splitFilename[1024];
+ char *tag;
+ char *tagArgPos;
+ char *desc;
+ char *descSpacePos;
+ mode_t omode;
+
+ /*
+ * Strip eventual argument part from "tag" (e.g. the name of functions)
+ * Example: "foobar(_arg1 int, _arg2 int)" --> "foobar"
+ */
+ tagArgPos = strstr(te->tag,"(");
+ if (tagArgPos == NULL)
+ tag = strdup(te->tag);
+ else
+ tag = strndup(te->tag, tagArgPos - te->tag);
+
+ desc = strdup(te->desc);
+ descSpacePos = strstr(desc," ");
+ /*
+ * Replace " " with "_" in "desc"
+ * Example: "FK CONSTRAINT" --> "FK_CONSTRAINT"
+ */
+ while ((descSpacePos = strstr(desc, " ")) != NULL)
+ {
+ char *dup = strdup(desc);
+ strlcpy(desc, dup, descSpacePos - desc + 1);
+ strcat(desc, "_");
+ strcat(desc, dup + (descSpacePos - desc) + strlen("_"));
+ free(dup);
+ }
+
+ /*
+ * Build path consisting of [filename]-split/[schema]/[desc]/[tag].sql
+ * Create the directories
+ *
+ * Example: dumpfile-split/public/FUNCTION/foobar.sql
+ */
+ omode = S_IRWXU | S_IRWXG | S_IRWXO;
+ snprintf(splitFilename, 1024, "%s-split", ropt->filename);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s", ropt->filename, te->namespace);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s", ropt->filename, te->namespace, desc);
+ mkdir(splitFilename, omode);
+
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%s.sql", ropt->filename, te->namespace, desc, tag);
+
+ /* Add \i <split file name> to main dump file */
+ ahprintf(AH, "\\i %s\n", splitFilename);
+
+ /*
+ * Close the normal file handle to which non-splittable
+ * objects are written.
+ *
+ * Open split file handle for splitFilename.
+ *
+ * In the end of the function,
+ * the split file handle will be closed, and
+ * the normal file handle will be reopened again.
+ */
+ fclose(AH->OF);
+ AH->OF = fopen(splitFilename, PG_BINARY_A);
+ }
+
/* Select owner, schema, and tablespace as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
***************
*** 2960,2965 **** _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isDat
--- 3035,3050 ----
free(AH->currUser);
AH->currUser = NULL;
}
+
+ /*
+ * If we are using the --split option,
+ * close the split file handle, and reopen the normal file handle.
+ */
+ if (ropt->split_files && te->catalogId.oid && te->namespace)
+ {
+ fclose(AH->OF);
+ AH->OF = fopen(ropt->filename, PG_BINARY_A);
+ }
}
void
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 134,139 **** static int disable_dollar_quoting = 0;
--- 134,140 ----
static int dump_inserts = 0;
static int column_inserts = 0;
static int no_security_label = 0;
+ static int split_files = 0;
static void help(const char *progname);
***************
*** 316,321 **** main(int argc, char **argv)
--- 317,323 ----
{"role", required_argument, NULL, 3},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
{"no-security-label", no_argument, &no_security_label, 1},
+ {"split", no_argument, &split_files, 1},
{NULL, 0, NULL, 0}
};
***************
*** 804,809 **** main(int argc, char **argv)
--- 806,812 ----
ropt->disable_triggers = disable_triggers;
ropt->use_setsessauth = use_setsessauth;
ropt->dataOnly = dataOnly;
+ ropt->split_files = split_files;
if (compressLevel == -1)
ropt->compression = 0;
***************
*** 836,841 **** help(const char *progname)
--- 839,845 ----
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n"));
+ printf(_(" --split split objects into separate plain text files\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
pg-dump-split-plain-text-files-9.1alpha2.patchapplication/octet-stream; name=pg-dump-split-plain-text-files-9.1alpha2.patchDownload
diff -c -r postgresql-9.1alpha2/src/bin/pg_dump/pg_backup_archiver.c postgresql-9.1alpha2-pg-dump-split/src/bin/pg_dump/pg_backup_archiver.c
*** postgresql-9.1alpha2/src/bin/pg_dump/pg_backup_archiver.c 2010-10-31 16:42:51.000000000 +0100
--- postgresql-9.1alpha2-pg-dump-split/src/bin/pg_dump/pg_backup_archiver.c 2010-12-29 01:32:39.405499834 +0100
***************
*** 27,32 ****
--- 27,34 ----
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
+ #include <sys/stat.h>
+ #include <string.h>
#ifdef WIN32
#include <io.h>
***************
*** 2828,2833 ****
--- 2830,2908 ----
return;
}
+ /*
+ * Split object into separate file,
+ * if the --split option is enabled and,
+ * if the object has an oid and,
+ * if the object has a namespace.
+ */
+ if (ropt->split_files && te->catalogId.oid && te->namespace)
+ {
+ char splitFilename[1024];
+ char *tag;
+ char *tagArgPos;
+ char *desc;
+ char *descSpacePos;
+ mode_t omode;
+
+ /*
+ * Strip eventual argument part from "tag" (e.g. the name of functions)
+ * Example: "foobar(_arg1 int, _arg2 int)" --> "foobar"
+ */
+ tagArgPos = strstr(te->tag,"(");
+ if (tagArgPos == NULL)
+ tag = strdup(te->tag);
+ else
+ tag = strndup(te->tag, tagArgPos - te->tag);
+
+ desc = strdup(te->desc);
+ descSpacePos = strstr(desc," ");
+ /*
+ * Replace " " with "_" in "desc"
+ * Example: "FK CONSTRAINT" --> "FK_CONSTRAINT"
+ */
+ while ((descSpacePos = strstr(desc, " ")) != NULL)
+ {
+ char *dup = strdup(desc);
+ strlcpy(desc, dup, descSpacePos - desc + 1);
+ strcat(desc, "_");
+ strcat(desc, dup + (descSpacePos - desc) + strlen("_"));
+ free(dup);
+ }
+
+ /*
+ * Build path consisting of [filename]-split/[schema]/[desc]/[tag].sql
+ * Create the directories
+ *
+ * Example: dumpfile-split/public/FUNCTION/foobar.sql
+ */
+ omode = S_IRWXU | S_IRWXG | S_IRWXO;
+ snprintf(splitFilename, 1024, "%s-split", ropt->filename);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s", ropt->filename, te->namespace);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s", ropt->filename, te->namespace, desc);
+ mkdir(splitFilename, omode);
+
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%s.sql", ropt->filename, te->namespace, desc, tag);
+
+ /* Add \i <split file name> to main dump file */
+ ahprintf(AH, "\\i %s\n", splitFilename);
+
+ /*
+ * Close the normal file handle to which non-splittable
+ * objects are written.
+ *
+ * Open split file handle for splitFilename.
+ *
+ * In the end of the function,
+ * the split file handle will be closed, and
+ * the normal file handle will be reopened again.
+ */
+ fclose(AH->OF);
+ AH->OF = fopen(splitFilename, PG_BINARY_A);
+ }
+
/* Select owner, schema, and tablespace as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
***************
*** 2958,2963 ****
--- 3033,3048 ----
free(AH->currUser);
AH->currUser = NULL;
}
+
+ /*
+ * If we are using the --split option,
+ * close the split file handle, and reopen the normal file handle.
+ */
+ if (ropt->split_files && te->catalogId.oid && te->namespace)
+ {
+ fclose(AH->OF);
+ AH->OF = fopen(ropt->filename, PG_BINARY_A);
+ }
}
void
diff -c -r postgresql-9.1alpha2/src/bin/pg_dump/pg_backup.h postgresql-9.1alpha2-pg-dump-split/src/bin/pg_dump/pg_backup.h
*** postgresql-9.1alpha2/src/bin/pg_dump/pg_backup.h 2010-10-31 16:42:51.000000000 +0100
--- postgresql-9.1alpha2-pg-dump-split/src/bin/pg_dump/pg_backup.h 2010-12-29 01:30:46.275536712 +0100
***************
*** 143,148 ****
--- 143,149 ----
int number_of_jobs;
bool *idWanted; /* array showing which dump IDs to emit */
+ int split_files; /* --split option, split objects into separate files */
} RestoreOptions;
/*
diff -c -r postgresql-9.1alpha2/src/bin/pg_dump/pg_dump.c postgresql-9.1alpha2-pg-dump-split/src/bin/pg_dump/pg_dump.c
*** postgresql-9.1alpha2/src/bin/pg_dump/pg_dump.c 2010-10-31 16:42:51.000000000 +0100
--- postgresql-9.1alpha2-pg-dump-split/src/bin/pg_dump/pg_dump.c 2010-12-29 01:30:46.275536712 +0100
***************
*** 134,139 ****
--- 134,140 ----
static int dump_inserts = 0;
static int column_inserts = 0;
static int no_security_label = 0;
+ static int split_files = 0;
static void help(const char *progname);
***************
*** 316,321 ****
--- 317,323 ----
{"role", required_argument, NULL, 3},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
{"no-security-label", no_argument, &no_security_label, 1},
+ {"split", no_argument, &split_files, 1},
{NULL, 0, NULL, 0}
};
***************
*** 804,809 ****
--- 806,812 ----
ropt->disable_triggers = disable_triggers;
ropt->use_setsessauth = use_setsessauth;
ropt->dataOnly = dataOnly;
+ ropt->split_files = split_files;
if (compressLevel == -1)
ropt->compression = 0;
***************
*** 836,841 ****
--- 839,845 ----
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n"));
+ printf(_(" --split split objects into separate plain text files\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
pg-dump-split-plain-text-files-8.4.6.patchapplication/octet-stream; name=pg-dump-split-plain-text-files-8.4.6.patchDownload
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_backup_archiver.c postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup_archiver.c
*** postgresql-8.4.6/src/bin/pg_dump/pg_backup_archiver.c 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup_archiver.c 2010-12-29 01:13:06.195510786 +0100
***************
*** 27,32 ****
--- 27,34 ----
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
+ #include <sys/stat.h>
+ #include <string.h>
#ifdef WIN32
#include <io.h>
***************
*** 2752,2757 ****
--- 2754,2832 ----
return;
}
+ /*
+ * Split object into separate file,
+ * if the --split option is enabled and,
+ * if the object has an oid and,
+ * if the object has a namespace.
+ */
+ if (ropt->split_files && te->catalogId.oid && te->namespace)
+ {
+ char splitFilename[1024];
+ char *tag;
+ char *tagArgPos;
+ char *desc;
+ char *descSpacePos;
+ mode_t omode;
+
+ /*
+ * Strip eventual argument part from "tag" (e.g. the name of functions)
+ * Example: "foobar(_arg1 int, _arg2 int)" --> "foobar"
+ */
+ tagArgPos = strstr(te->tag,"(");
+ if (tagArgPos == NULL)
+ tag = strdup(te->tag);
+ else
+ tag = strndup(te->tag, tagArgPos - te->tag);
+
+ desc = strdup(te->desc);
+ descSpacePos = strstr(desc," ");
+ /*
+ * Replace " " with "_" in "desc"
+ * Example: "FK CONSTRAINT" --> "FK_CONSTRAINT"
+ */
+ while ((descSpacePos = strstr(desc, " ")) != NULL)
+ {
+ char *dup = strdup(desc);
+ strlcpy(desc, dup, descSpacePos - desc + 1);
+ strcat(desc, "_");
+ strcat(desc, dup + (descSpacePos - desc) + strlen("_"));
+ free(dup);
+ }
+
+ /*
+ * Build path consisting of [filename]-split/[schema]/[desc]/[tag].sql
+ * Create the directories
+ *
+ * Example: dumpfile-split/public/FUNCTION/foobar.sql
+ */
+ omode = S_IRWXU | S_IRWXG | S_IRWXO;
+ snprintf(splitFilename, 1024, "%s-split", ropt->filename);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s", ropt->filename, te->namespace);
+ mkdir(splitFilename, omode);
+ snprintf(splitFilename, 1024, "%s-split/%s/%s", ropt->filename, te->namespace, desc);
+ mkdir(splitFilename, omode);
+
+ snprintf(splitFilename, 1024, "%s-split/%s/%s/%s.sql", ropt->filename, te->namespace, desc, tag);
+
+ /* Add \i <split file name> to main dump file */
+ ahprintf(AH, "\\i %s\n", splitFilename);
+
+ /*
+ * Close the normal file handle to which non-splittable
+ * objects are written.
+ *
+ * Open split file handle for splitFilename.
+ *
+ * In the end of the function,
+ * the split file handle will be closed, and
+ * the normal file handle will be reopened again.
+ */
+ fclose(AH->OF);
+ AH->OF = fopen(splitFilename, PG_BINARY_A);
+ }
+
/* Select owner, schema, and tablespace as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
***************
*** 2881,2886 ****
--- 2956,2971 ----
free(AH->currUser);
AH->currUser = NULL;
}
+
+ /*
+ * If we are using the --split option,
+ * close the split file handle, and reopen the normal file handle.
+ */
+ if (ropt->split_files && te->catalogId.oid && te->namespace)
+ {
+ fclose(AH->OF);
+ AH->OF = fopen(ropt->filename, PG_BINARY_A);
+ }
}
void
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_backup.h postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup.h
*** postgresql-8.4.6/src/bin/pg_dump/pg_backup.h 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_backup.h 2010-12-29 01:08:42.255541320 +0100
***************
*** 142,147 ****
--- 142,148 ----
int number_of_jobs;
bool *idWanted; /* array showing which dump IDs to emit */
+ int split_files; /* --split option, split objects into separate files */
} RestoreOptions;
/*
diff -c -r postgresql-8.4.6/src/bin/pg_dump/pg_dump.c postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_dump.c
*** postgresql-8.4.6/src/bin/pg_dump/pg_dump.c 2010-12-14 03:59:19.000000000 +0100
--- postgresql-8.4.6-pg-dump-split/src/bin/pg_dump/pg_dump.c 2010-12-29 01:08:42.255541320 +0100
***************
*** 111,116 ****
--- 111,117 ----
static int disable_dollar_quoting = 0;
static int dump_inserts = 0;
static int column_inserts = 0;
+ static int split_files = 0;
static void help(const char *progname);
***************
*** 277,282 ****
--- 278,284 ----
{"no-tablespaces", no_argument, &outputNoTablespaces, 1},
{"role", required_argument, NULL, 3},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
+ {"split", no_argument, &split_files, 1},
{NULL, 0, NULL, 0}
};
***************
*** 765,770 ****
--- 767,773 ----
ropt->disable_triggers = disable_triggers;
ropt->use_setsessauth = use_setsessauth;
ropt->dataOnly = dataOnly;
+ ropt->split_files = split_files;
if (compressLevel == -1)
ropt->compression = 0;
***************
*** 797,802 ****
--- 800,806 ----
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
printf(_(" --lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock\n"));
+ printf(_(" --split split objects into separate plain text files\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
On 12/28/2010 08:18 PM, Joel Jacobson wrote:
2010/12/29 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>
If you've solved the deterministic-ordering problem, then this entire
patch is quite useless. You can just run a normal dump and diff it.No, that's only half true.
Diff will do a good job minimizing the "size" of the diff output, yes,
but such a diff is still quite useless if you want to quickly grasp
the context of the change.If you have a hundreds of functions, just looking at the changed
source code is not enough to figure out which functions were modified,
unless you have the brain power to memorize every single line of code
and are able to figure out the function name just by looking at the
old and new line of codes.
try:
diff -F '^CREATE' ...
cheers
andrew
2010/12/29 Andrew Dunstan <andrew@dunslane.net>
try:
diff -F '^CREATE' ...cheers
andrew
Embarrasing, I'm sure I've done `man diff` before, must have missed that
one, wish I'd known about that feature before, would have saved me many
hours! :-) Thanks for the tip!
There are some other real-life use-cases where I think splitting would be
nice and save a lot of time:
a) if you don't have a perfect 1:1 relationship between all the SPs in your
database and your source code repository (handled by your favorite version
control system), i.e. if you suspect some SPs in the database might differ
compared to the source code files in your repo. In this scenario, it might
be simpler to "start over" and continue developing on a repo built from a
pg_dump --split export. You would lose all history, but it might still be
worth it if the "compare everything in database against source code files in
repo"-project would take a lot of man hours.
b) quick branching - perhaps you are a consultant at a company where they
don't even have the SPs stored in separate files, they might have been
magically installed by some consultant before you without any trace. :-) To
get up to speed solving the problem you've been assigned, which in this
example involves a lot of SP coding and modifications of existing functions,
it would save a lot of time if you had all functions in separate files
before you started coding, then you would use git or any other nice version
control system to track your changes and figure out what you've done once
you get everything to work.
c) automatically saving daily snapshots of your production database schema
to your version control system. While the best version control system (git)
does not track individual files, many of the ancient ones still very popular
ones like svn do so. If every function in the production database schema
would be saved automatically to the VCS, you would be guaranteed to have a
tack of all deployed changes affecting each function, which is probably a
lot fewer changes compared to the entire history for each function, assuming
developers commit things while developing and not only when deploying.
d) while pg_dump offers some options to limit the output content, such as -s
for "schema only" and -t/-T to limit which tables to dump, it lacks options
to export "functions only" or "these functions only". It would require quite
a lot of such options to provide the same flexibility as a split dump,
highly reducing the need for such options as you could then compose your own
restore script based on the dump.
Of course, not all of these scenarios are relevant for everybody.
--
Best regards,
Joel Jacobson
Glue Finance
On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson <joel@gluefinance.com> wrote:
<description of split stuff>
So, how different (or not) is this to the "directory" format that was
coming out of the desire of a parallel pg_dump?
a.
--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.
2010/12/29 Aidan Van Dyk <aidan@highrise.ca>
On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson <joel@gluefinance.com>
wrote:<description of split stuff>
So, how different (or not) is this to the "directory" format that was
coming out of the desire of a parallel pg_dump?
Not sure what format you are referring to? Custom, tar or plain text?
I noticed there are two undocumented formats as well, "append" and "file".
I tried both of these undocumented formats, but it did not procude any
directory structure of the dumped objects.
Could you please explain how to use the "directory format" is such a format
already exists?
I can't find it in the documentation nor the source code of HEAD.
a.
--
Aidan Van Dyk Create like a
god,
aidan@highrise.ca command like a
king,
http://www.highrise.ca/ work like a
slave.
--
Best regards,
Joel Jacobson
Glue Finance
E: jj@gluefinance.com
T: +46 70 360 38 01
Postal address:
Glue Finance AB
Box 549
114 11 Stockholm
Sweden
Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden
On Wed, Dec 29, 2010 at 8:31 AM, Joel Jacobson <joel@gluefinance.com> wrote:
2010/12/29 Aidan Van Dyk <aidan@highrise.ca>
On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson <joel@gluefinance.com>
wrote:
<description of split stuff>
So, how different (or not) is this to the "directory" format that was
coming out of the desire of a parallel pg_dump?Not sure what format you are referring to? Custom, tar or plain text?
I noticed there are two undocumented formats as well, "append" and "file".
I tried both of these undocumented formats, but it did not procude any
directory structure of the dumped objects.Could you please explain how to use the "directory format" is such a format
already exists?
I can't find it in the documentation nor the source code of HEAD.
It is still being discussed as a patch to pg_dump. Google for "directory
archive format for pg_dump", specifically in archives.postgresql.org.
AFAIK, that applies to parallel dumps of data (may help in --schema-only
dumps too), and what you are trying is for schema.
Regards
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
On Wed, Dec 29, 2010 at 9:11 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
On Wed, Dec 29, 2010 at 8:31 AM, Joel Jacobson <joel@gluefinance.com> wrote:
2010/12/29 Aidan Van Dyk <aidan@highrise.ca>
On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson <joel@gluefinance.com>
wrote:<description of split stuff>
So, how different (or not) is this to the "directory" format that was
coming out of the desire of a parallel pg_dump?Not sure what format you are referring to? Custom, tar or plain text?
I noticed there are two undocumented formats as well, "append" and "file".
I tried both of these undocumented formats, but it did not procude any
directory structure of the dumped objects.
Could you please explain how to use the "directory format" is such a
format already exists?
I can't find it in the documentation nor the source code of HEAD.It is still being discussed as a patch to pg_dump. Google for "directory
archive format for pg_dump", specifically in archives.postgresql.org.
Specifically:
Message-ID: <AANLkTimUELTXwRSQDQNwxik_k1y3YcH1u-9NgHZqpi9e@mail.gmail.com>
AFAIK, that applies to parallel dumps of data (may help in --schema-only
dumps too), and what you are trying is for schema.
Right, but one of the things it does is break the dump in to parts,
and put them in a directory/file organization.
Both are doing it for different reasons, but doing pretty much the
same thing. But can the layout/organization of Joachim's patch can be
made "human friendly" in the vein of Joel's vision?
a.
--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.
Aidan Van Dyk <aidan@highrise.ca> writes:
On Wed, Dec 29, 2010 at 9:11 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
AFAIK, that applies to parallel dumps of data (may help in --schema-only
dumps too), and what you are trying is for schema.
Right, but one of the things it does is break the dump in to parts,
and put them in a directory/file organization.
Both are doing it for different reasons, but doing pretty much the
same thing. But can the layout/organization of Joachim's patch can be
made "human friendly" in the vein of Joel's vision?
I think they're fundamentally different things, because the previously
proposed patch is an extension of the machine-readable archive format,
and has to remain so because of the expectation that people will want
to use parallel restore with it. Joel is arguing for a split-up of
the text dump format.
regards, tom lane
2010/12/29 Tom Lane <tgl@sss.pgh.pa.us>
I think they're fundamentally different things, because the previously
proposed patch is an extension of the machine-readable archive format,
and has to remain so because of the expectation that people will want
to use parallel restore with it. Joel is arguing for a split-up of
the text dump format.
Yes, exactly.
My patch is of course also a lot smaller :-)
pg_dump-directory.diff.........................: 112 853 bytes
pg-dump-split-plain-text-files-9.1devel.patch..: 5 579 bytes
I just tried the pg_dump-directory.diff patch.
The only thing is has in common with my patch is it writes data to different
files, and it's only the data which is splitted into different files, the
schema appears to go into the single file "TOC".
Example, pg_dump-directory.diff:
$ ./pg_dump -f /crypt/dirpatch -F d -s glue
$ ls -la /crypt/dirpatch/
TOC
(1 file)
$ rm -rf /crypt/dirpatch
$ ./pg_dump -f /crypt/dirpatch -F d glue
$ ls /crypt/dirpatch/
6503.dat
6504.dat
...lots of files...
6871.dat
6872.dat
6873.dat
6874.dat
TOC
Example, pg_dump --split patch:
$ pg_dump -f /crypt/splitpatch -F p --split -s glue
$ ls /crypt/splitpatch*
/crypt/splitpatch (file)
/crypt/splitpatch-split: (directory)
myschema1
myschema2
public
$ ls /crypt/splitpatch-split/public/
AGGREGATE
CONSTRAINT
FK_CONSTRAINT
FUNCTION
INDEX
SEQUENCE
TABLE
TRIGGER
TYPE
VIEW
$ ls /crypt/splitpatch-split/public/FUNCTION/
myfunc.sql
otherfunc.sql
$ cat /crypt/splitpatch
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
...etc...
\i /crypt/splitpatch-split/public/FUNCTION/myfunc.sql
\i /crypt/splitpatch-split/public/FUNCTION/otherfunc.sql
--
Best regards,
Joel Jacobson
Glue Finance
Joel Jacobson <joel@gluefinance.com> writes:
Solution: I propose a new option to pg_dump, --split, which dumps each
object to a separate file in a user friendly directory structure:
Please have a look at getddl:
https://github.com/dimitri/getddl
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
2010/12/29 Dimitri Fontaine <dimitri@2ndquadrant.fr>
Please have a look at getddl:
Nice! Looks like a nifty tool.
When I tried it, "./getddl.py -f -F /crypt/funcs -d glue", I got the error
"No such file or directory: 'sql/schemas.sql'".
While the task of splitting objects into separate files could be solved by
an external "wrapper tool" like yours around pg_dump,
I argue it makes more sense of putting the (minimal required) logics into
pg_dump, due to a number of reasons, most importantly because it's simplier
and less complex, thus less error prone.
My patch is only a few lines of code and doesn't add any logics to pg_dump,
it merely reroutes the fwrite() system calls based on the toc entries.
Just the fact you and others had to create own tools to do the splitting
shows the feature is important, which I think should be included in the
normal pg_dump tool.
Happy new year!
Joel Jacobson
Glue Finance
On Thu, Dec 30, 2010 at 2:13 AM, Joel Jacobson <joel@gluefinance.com> wrote:
2010/12/29 Dimitri Fontaine <dimitri@2ndquadrant.fr>
Please have a look at getddl:
Nice! Looks like a nifty tool.
When I tried it, "./getddl.py -f -F /crypt/funcs -d glue", I got the error
"No such file or directory: 'sql/schemas.sql'".While the task of splitting objects into separate files could be solved by
an external "wrapper tool" like yours around pg_dump,
I argue it makes more sense of putting the (minimal required) logics into
pg_dump, due to a number of reasons, most importantly because it's simplier
and less complex, thus less error prone.My patch is only a few lines of code and doesn't add any logics to pg_dump,
it merely reroutes the fwrite() system calls based on the toc entries.Just the fact you and others had to create own tools to do the splitting
shows the feature is important, which I think should be included in the
normal pg_dump tool.
As someone whose own version of "getddl" helped inspire Dimitri to create
his own version, I've both enjoyed reading this thread and seeing this wheel
reinvented yet again, and wholeheartedly +1 the idea of building this
directly into pg_dump. (The only thing better would be to make everything
thing sql callable, but that's a problem for another day).
Robert Treat
http://www.xzilla.net
On tis, 2010-12-28 at 12:33 -0500, Tom Lane wrote:
(2) randomly different ordering of rows within a table. Your patch
didn't address that, unless I misunderstood quite a bit.
This issue here is just comparing schemas, so that part is a separate
problem for someone else.
I think the correct fix for (1) is to improve pg_dump's method for
sorting objects. It's not that bad now, but it does have issues with
random ordering of similarly-named objects. IIRC Peter Eisentraut
proposed something for this last winter but it seemed a mite too ugly,
and he got beaten down to just this:commit 1acc06a1f4ae752793d2199d8d462a6708c8acc2
Author: Peter Eisentraut <peter_e@gmx.net>
Date: Mon Feb 15 19:59:47 2010 +0000When sorting functions in pg_dump, break ties (same name) by
number of arguments
Yes, that was addressing the same underlying problem. Frankly, I have
been thinking split files a lot before and since then. If the files
were appropriately named, it would remove a lot of problems compared to
diffing one even perfectly sorted big dump file.
On tis, 2010-12-28 at 20:51 -0500, Andrew Dunstan wrote:
try:
diff -F '^CREATE' ...
This works about 67% of the time and still doesn't actually tell at a
glance what changed. It will only tell you what the change you are
currently looking at probably belongs to.
To me, this is a wonderful feature, thanks! I think many people would be
happy if this patch woud be included to the mainstream (and it is quite
short and simple).
About name ordering - I think that the problem exists for objects:
1. Stored functions.
2. Foreign keys/triggers (objects which has owning objects).
It is wonderful that you store all functions with the same name to the same
file. To order them within this file we may simply compare the first
definition line lexicographically (or - first line which differs one
function definition from another).
Foreign key/triggers ordering problem is described by me at
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg114586.html
The big problem is with triggers: many triggers may have the same name, but
be bound to different tables. It would be great to include these triggers to
table's definition or, at least, have separated files for each trigger+table
pair.
On Wed, Dec 29, 2010 at 6:21 PM, Joel Jacobson <joel@gluefinance.com> wrote:
Show quoted text
2010/12/29 Tom Lane <tgl@sss.pgh.pa.us>
I think they're fundamentally different things, because the previously
proposed patch is an extension of the machine-readable archive format,
and has to remain so because of the expectation that people will want
to use parallel restore with it. Joel is arguing for a split-up of
the text dump format.Yes, exactly.
My patch is of course also a lot smaller :-)
pg_dump-directory.diff.........................: 112 853 bytes
pg-dump-split-plain-text-files-9.1devel.patch..: 5 579 bytesI just tried the pg_dump-directory.diff patch.
The only thing is has in common with my patch is it writes data to
different files, and it's only the data which is splitted into different
files, the schema appears to go into the single file "TOC".Example, pg_dump-directory.diff:
$ ./pg_dump -f /crypt/dirpatch -F d -s glue
$ ls -la /crypt/dirpatch/
TOC
(1 file)$ rm -rf /crypt/dirpatch
$ ./pg_dump -f /crypt/dirpatch -F d glue
$ ls /crypt/dirpatch/
6503.dat
6504.dat
...lots of files...
6871.dat
6872.dat
6873.dat
6874.dat
TOCExample, pg_dump --split patch:
$ pg_dump -f /crypt/splitpatch -F p --split -s glue
$ ls /crypt/splitpatch*
/crypt/splitpatch (file)
/crypt/splitpatch-split: (directory)
myschema1
myschema2
public
$ ls /crypt/splitpatch-split/public/
AGGREGATE
CONSTRAINT
FK_CONSTRAINT
FUNCTION
INDEX
SEQUENCE
TABLE
TRIGGER
TYPE
VIEW$ ls /crypt/splitpatch-split/public/FUNCTION/
myfunc.sql
otherfunc.sql$ cat /crypt/splitpatch
--
-- PostgreSQL database dump
--SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
...etc...
\i /crypt/splitpatch-split/public/FUNCTION/myfunc.sql
\i /crypt/splitpatch-split/public/FUNCTION/otherfunc.sql--
Best regards,Joel Jacobson
Glue Finance
On Mon, Jan 3, 2011 at 7:11 AM, Dmitry Koterov <dmitry@koterov.ru> wrote:
To me, this is a wonderful feature, thanks! I think many people would be
happy if this patch woud be included to the mainstream (and it is quite
short and simple).
About name ordering - I think that the problem exists for objects:
1. Stored functions.
2. Foreign keys/triggers (objects which has owning objects).
It is wonderful that you store all functions with the same name to the same
file. To order them within this file we may simply compare the first
definition line lexicographically (or - first line which differs one
function definition from another).
Foreign key/triggers ordering problem is described by me at
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg114586.html
The big problem is with triggers: many triggers may have the same name, but
be bound to different tables. It would be great to include these triggers to
table's definition or, at least, have separated files for each trigger+table
pair.
Hmm... this sounds awfully specific to your particular environment.
My triggers never have the same names... and on the flip side, I
don't see why someone couldn't want one function per file rather than
all same-named functions in one file.
I think the problem with this patch is that different people are
likely to want slightly different things, and there may not be any
single format that pleases everyone, and supporting too many variants
will become confusing for users and hard for us to maintain. We're
going to need to agree on something that won't be perfect for
everyone, but will hopefully be a sufficient improvement for enough
people to be worth doing.
On the specific issue of overloaded functions, I have a feeling that
the only feasible option is going to be to put them all in the same
file. If you put them in different files, the names will either be
very long (because they'll have to include the argument types) or
fairly incomprehensible (if you did something like hash the argument
types and append 8 hex digits to the function name) or not all that
static (if you use OIDs; or if you number them sequentially, like
foo1.sql, foo2.sql, foo3.sql, then foo3.sql might end up as foo2.sql
on a system where there are only two variants of foo, making diff not
work very well).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
2011/1/3 Robert Haas <robertmhaas@gmail.com>:
will become confusing for users and hard for us to maintain. We're
going to need to agree on something that won't be perfect for
everyone, but will hopefully be a sufficient improvement for enough
people to be worth doing.
Good point.
I think we can at least agree the "bare minimum" is splitting per
namespace, object type and name.
On the specific issue of overloaded functions, I have a feeling that
the only feasible option is going to be to put them all in the same
file. If you put them in different files, the names will either be
very long (because they'll have to include the argument types) or
fairly incomprehensible (if you did something like hash the argument
types and append 8 hex digits to the function name) or not all that
static (if you use OIDs; or if you number them sequentially, like
foo1.sql, foo2.sql, foo3.sql, then foo3.sql might end up as foo2.sql
on a system where there are only two variants of foo, making diff not
work very well).
I agree.
Even if the overloaded functions are not written in the same order,
you will quickly and easily note "function(s) of this particular name
has been changed", which should narrow down your
mind-mapping-change-grasping-exercise quite a lot.
--
Best regards,
Joel Jacobson
Glue Finance
Robert Haas <robertmhaas@gmail.com> writes:
On the specific issue of overloaded functions, I have a feeling that
the only feasible option is going to be to put them all in the same
file. If you put them in different files, the names will either be
very long (because they'll have to include the argument types) or
fairly incomprehensible (if you did something like hash the argument
types and append 8 hex digits to the function name) or not all that
static (if you use OIDs; or if you number them sequentially, like
foo1.sql, foo2.sql, foo3.sql, then foo3.sql might end up as foo2.sql
on a system where there are only two variants of foo, making diff not
work very well).
If you put all the variants in the same file, diff is *still* not going
to work very well. At least not unless you solve the problems that keep
pg_dump from dumping objects in a consistent order ... and once you do
that, you don't need this patch.
I think the problem with this patch is that different people are
likely to want slightly different things, and there may not be any
single format that pleases everyone, and supporting too many variants
will become confusing for users and hard for us to maintain.
Yeah, that's exactly it. I can think of some possible uses for
splitting up pg_dump output, but frankly "to ease diff-ing" is not
one of them. For that problem, it's nothing but a crude kluge that
only sort-of helps. If we're to get anywhere on this, we need a
better-defined problem statement that everyone can agree is worth
solving and is well solved with this particular approach.
regards, tom lane
2011/1/3 Tom Lane <tgl@sss.pgh.pa.us>:
pg_dump from dumping objects in a consistent order ... and once you do
that, you don't need this patch.
Yeah, that's exactly it. I can think of some possible uses for
splitting up pg_dump output, but frankly "to ease diff-ing" is not
one of them. For that problem, it's nothing but a crude kluge that
only sort-of helps. If we're to get anywhere on this, we need a
better-defined problem statement that everyone can agree is worth
solving and is well solved with this particular approach.
The problem statement is only partly diffing, I think the two major
other problems with one-single-giant-schema-file is:
1. Automatically version controlling your production database schema
using a file based approach is not possible.
Splitting the schema would mean it's very simple to setup a cronjob
which automatically commits the schema changes every night. You would
be able to follow the changes by simply looking at the vcs log,
instead of grepping the log files and trying to figure out what
changed.
2. Splitting is a single option which reduces the need for any other
imaginable exclude/include options, such as "only export these
functions" or "do not export views" etc. Tables are of course the most
common thing you want to include/exclude in a dump, but there are
quite a lot of different object types, however, no single object type
is "important enough" to motivate a new pg_dump option to allow a
exclusion/inclusion option, but I think the sum of the need for such
an option for all object types is greather than the extra complexity
introduced by a patch consisting of only a few lines of code.
--
Best regards,
Joel Jacobson
Glue Finance
On Mon, Jan 3, 2011 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On the specific issue of overloaded functions, I have a feeling that
the only feasible option is going to be to put them all in the same
file. If you put them in different files, the names will either be
very long (because they'll have to include the argument types) or
fairly incomprehensible (if you did something like hash the argument
types and append 8 hex digits to the function name) or not all that
static (if you use OIDs; or if you number them sequentially, like
foo1.sql, foo2.sql, foo3.sql, then foo3.sql might end up as foo2.sql
on a system where there are only two variants of foo, making diff not
work very well).If you put all the variants in the same file, diff is *still* not going
to work very well. At least not unless you solve the problems that keep
pg_dump from dumping objects in a consistent order ... and once you do
that, you don't need this patch.
That's not really true. It's a whole lot easier to look a diff of two
100-line files and then repeat that N times than to look at a single
diff of two N*100 line files. I certainly spend enough of my
patch-review doing "git diff master <some particular source file>",
and then if what's going on isn't clear you can look at just that file
in more detail without worrying about every other source file in the
system. And I have encountered this problem when comparing database
schemas (and sometimes data) also. Yes, I've done that using diff.
Yes, it did suck. Yes, I got it done before my boss fired me.
I think the problem with this patch is that different people are
likely to want slightly different things, and there may not be any
single format that pleases everyone, and supporting too many variants
will become confusing for users and hard for us to maintain.Yeah, that's exactly it. I can think of some possible uses for
splitting up pg_dump output, but frankly "to ease diff-ing" is not
one of them. For that problem, it's nothing but a crude kluge that
only sort-of helps. If we're to get anywhere on this, we need a
better-defined problem statement that everyone can agree is worth
solving and is well solved with this particular approach.
I have to admit I'm a bit unsold on the approach as well. It seems
like you could write a short Perl script which would transform a text
format dump into the proposed format pretty easily, and if you did
that and published the script, then the next poor shmuck who had the
same problem could either use the script as-is or hack it up to meet
some slightly different set of requirements. Or maybe you'd be better
off basing such a script on the custom or tar format instead, in order
to avoid the problem of misidentifying a line beginning with --- as a
comment when it's really part of a data item. Or maybe even writing a
whole "schema diff" tool that would take two custom-format dumps as
inputs.
On the other hand, I can certainly think of times when even a pretty
dumb implementation of this would have saved me some time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Jan 3, 2011 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, that's exactly it. �I can think of some possible uses for
splitting up pg_dump output, but frankly "to ease diff-ing" is not
one of them. �For that problem, it's nothing but a crude kluge that
only sort-of helps. �If we're to get anywhere on this, we need a
better-defined problem statement that everyone can agree is worth
solving and is well solved with this particular approach.
I have to admit I'm a bit unsold on the approach as well. It seems
like you could write a short Perl script which would transform a text
format dump into the proposed format pretty easily, and if you did
that and published the script, then the next poor shmuck who had the
same problem could either use the script as-is or hack it up to meet
some slightly different set of requirements. Or maybe you'd be better
off basing such a script on the custom or tar format instead, in order
to avoid the problem of misidentifying a line beginning with --- as a
comment when it's really part of a data item. Or maybe even writing a
whole "schema diff" tool that would take two custom-format dumps as
inputs.
On the other hand, I can certainly think of times when even a pretty
dumb implementation of this would have saved me some time.
The basic objection that I have to this patch is that it proposes to
institutionalize a pretty dumb implementation. And, as you mentioned,
once it's in there it'll be more or less set in stone because we aren't
going to want to support umpteen variants.
I like the idea of a postprocessing script a lot better --- it seems
like it wouldn't get in the way of people making their own variants.
And as you say it'd likely be pretty trivial to do.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes:
I have to admit I'm a bit unsold on the approach as well. It seems
like you could write a short Perl script which would transform a text
format dump into the proposed format pretty easily, and if you did
that and published the script, then the next poor shmuck who had the
same problem could either use the script as-is or hack it up to meet
some slightly different set of requirements. Or maybe you'd be better
That's actually what I first did a couple of months ago,
https://github.com/gluefinance/parse_schema/blob/master/parse_schema.pl
My major concern of parsing the schema file is I would never fully
trust the output from the script, even if the regex is extremely
paranoid and really strict, there is still a risk it contains a bug.
If you cannot trust the output from the schema parse script, it's not
safe to use it to do a partial restore of objects.
Let's say you want to restore only a few functions from your backup
schema. Without --split, you would need to restore them manually
somehow. With the --split option, you could simply restore them from
the indivudual files, at least functions where only the source code
has been modified and not the arguments.
2011/1/3 Tom Lane <tgl@sss.pgh.pa.us>:
The basic objection that I have to this patch is that it proposes to
institutionalize a pretty dumb implementation. And, as you mentioned,
What's dumb about it? It's simple and it works.
Please give me an idea of what a smart implementation would be, and I
will see if I can rewrite the patch.
once it's in there it'll be more or less set in stone because we aren't
going to want to support umpteen variants.
Yes, that's why it should be a "bare minimum" solution to the problem,
which is a lot better than no solution at all.
What other variants than the discussion on the path name structure can
you think of?
I like the idea of a postprocessing script a lot better --- it seems
like it wouldn't get in the way of people making their own variants.
And as you say it'd likely be pretty trivial to do.
I'm quite sure such a postprocessing script has been written hundreds
of times over the past years by different postgres users, not even
submitting a question to the mailing list, since it's a quite
managable task for anyone with some regex and scripting experience.
Why not just provide a simple "bare minimum" solution to the "problem"
and let the remaining x % of the users who have a very specific need
write their own specialized script if they need to? It would save a
lot of time for all future users who need to write their own script to
solve the problem or google until they find my script or any other
script solving the same problem. Multiple almost identical solutions
to exactly the same problem is kind of time waste.
--
Best regards,
Joel Jacobson
Glue Finance
On Mon, Jan 3, 2011 at 2:46 PM, Joel Jacobson <joel@gluefinance.com> wrote:
My major concern of parsing the schema file is I would never fully
trust the output from the script, even if the regex is extremely
paranoid and really strict, there is still a risk it contains a bug.
That could possibly be resolved by using the custom or tar formats.
If you cannot trust the output from the schema parse script, it's not
safe to use it to do a partial restore of objects.
Let's say you want to restore only a few functions from your backup
schema.
I don't think this is the right solution to that problem; or at least,
it's not the solution I want personally. I want something like
--function=glob (and yes I do want that same thing for every other
object type, too, but PARTICULARLY for functions). This might be a
next-best solution to that problem, but I'm still holding out for the
home run.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I have to admit I'm a bit unsold on the approach as well. It seems
like you could write a short Perl script which would transform a text
format dump into the proposed format pretty easily, and if you did
that and published the script, then the next poor shmuck who had the
same problem could either use the script as-is or hack it up to meet
some slightly different set of requirements. Or maybe you'd be better
off basing such a script on the custom or tar format instead, in order
to avoid the problem of misidentifying a line beginning with --- as a
comment when it's really part of a data item. Or maybe even writing a
whole "schema diff" tool that would take two custom-format dumps as
inputs.On the other hand, I can certainly think of times when even a pretty
dumb implementation of this would have saved me some time.
You mean like those:
https://labs.omniti.com/labs/pgtreats/wiki/getddl
https://github.com/dimitri/getddl
In this case hacking the perl version was impossible for me to hack on
so my version is in python, but still is somewhat of a fork. I even
have an intermediate shell version somewhere that a colleague wrote, but
I much prefer maintaining python code.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Jan 3, 2011 at 3:15 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
On the other hand, I can certainly think of times when even a pretty
dumb implementation of this would have saved me some time.You mean like those:
https://labs.omniti.com/labs/pgtreats/wiki/getddl
https://github.com/dimitri/getddl
That's not exactly what I was trying to do, but it's in the ballpark...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 28.12.2010 17:00, Joel Jacobson wrote:
Dear fellow hackers,
Problem: A normal diff of two slightly different schema dump files
(pg_dump -s), will not produce a user-friendly diff, as you get all
changes in the same file.
Another Solution: I have used a python script for spliiting "dump -s"
output into a directory structure for years
A structure that seems to work well is
1. <database_name>.load.sql - file containing \i for all other files
2. tree of
schema1\
+- <objtype1>\
| + obj_of_type1_name1.sql
| + ...
+-<objtype2>\
...
That is, first have one directory per schema on top level and then one
for each type of objects
inside have one file per object.
Some things which may have non-unique names, like functions need extra
"uniquefication", like adding a underscore (or colon) separated list of
argument types at the end.
Most times you want to leave out the comment lines with OIDs so that
you can diff the files against another version
------------------------------------------
Hannu Krosing
http://www.2ndQuadrant.com/books/
On 28.12.2010 22:44, Joel Jacobson wrote:
Sent from my iPhone
On 28 dec 2010, at 21:45, Gurjeet Singh <singh.gurjeet@gmail.com
<mailto:singh.gurjeet@gmail.com>> wrote:The problem I see with suffixing a sequence id to the objects with
name collision is that one day the dump may name myfunc(int) as
myfunc.sql and after an overloaded version is created, say
myfunc(char, int), then the same myfunc(int) may be dumped in
myfunc-2.sql, which again is non-deterministic.I agree, good point!
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc
to reduce the need of truncating filenames.Also, it is a project policy that we do not introduce new features in
back branches, so spending time on an 8.4.6 patch may not be the best
use of your time.My company is using 8.4 and needs this feature, so I'll have to patch
it anyway :)
Start the easy way, by writing a (python|perl) filter for pg_dump -s output
Once this is done, convert it into a patch for pg_dump
--------------------------------------------
Hannu Krosing
http://www.2ndQuadrant.com/books/
On 28.12.2010 23:51, Tom Lane wrote:
Andrew Dunstan<andrew@dunslane.net> writes:
On 12/28/2010 04:44 PM, Joel Jacobson wrote:
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc
to reduce the need of truncating filenames.I think that's just horrible. Does the i stand for integer or inet? And
it will get *really* ugly for type names with spaces in them ...You think spaces are bad, try slashes ;-)
Not to mention the need for including schemas in typenames sometimes.
I think you're going to have a real problem trying to fully describe a
function's signature in a file name of reasonable max length.
something like
funcname_<number-of-arguments>_<hash_of_argument_type_list>.sql
seems like a reasonable compromise - you can find the function you are
looking for without too much searching, even when overloaded and the
uniqueness is still guaranteed.
--------------------------------------------
Hannu Krosing
http://www.2ndQuadrant.com/books/
Joel Jacobson wrote:
To understand a change to my database functions, I would start by
looking at the top-level, only focusing on the names of the functions
modified/added/removed.
At this stage, you want as little information as possible about each
change, such as only the names of the functions.
To do this, get a list of changes functions, you cannot compare two
full schema plain text dumps using diff, as it would only reveal the
lines changed, not the name of the functions, unless you are lucky to
get the name of the function within the (by default) 3 lines of copied
context.While you could increase the number of copied lines of context to a
value which would ensure you would see the name of the function in the
diff, that is not feasible if you want to quickly "get a picture" of
the code areas modified, since you would then need to read through
even more lines of diff output.
I can agree on some use cases you've outlined, where there's merit to
the general idea of your patch. But as an aside, you really should
launch an investigation into some better diff tools if this is how
you're doing this type of work. Last week I reviewed 3K lines worth of
changes from two versions of a 12K line schema dump I'd never seen
before in a couple of hours using kdiff3. I'd have killed myself before
finishing if I had to do the same job with traditional diff as you're
describing it here.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Mon, Jan 3, 2011 at 2:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Jan 3, 2011 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, that's exactly it. I can think of some possible uses for
splitting up pg_dump output, but frankly "to ease diff-ing" is not
one of them. For that problem, it's nothing but a crude kluge that
only sort-of helps. If we're to get anywhere on this, we need a
better-defined problem statement that everyone can agree is worth
solving and is well solved with this particular approach.I have to admit I'm a bit unsold on the approach as well. It seems
like you could write a short Perl script which would transform a text
format dump into the proposed format pretty easily, and if you did
that and published the script, then the next poor shmuck who had the
same problem could either use the script as-is or hack it up to meet
some slightly different set of requirements. Or maybe you'd be better
off basing such a script on the custom or tar format instead, in order
to avoid the problem of misidentifying a line beginning with --- as a
comment when it's really part of a data item. Or maybe even writing a
whole "schema diff" tool that would take two custom-format dumps as
inputs.On the other hand, I can certainly think of times when even a pretty
dumb implementation of this would have saved me some time.The basic objection that I have to this patch is that it proposes to
institutionalize a pretty dumb implementation. And, as you mentioned,
once it's in there it'll be more or less set in stone because we aren't
going to want to support umpteen variants.I like the idea of a postprocessing script a lot better --- it seems
like it wouldn't get in the way of people making their own variants.
And as you say it'd likely be pretty trivial to do.
I notice that this patch is marked as "Needs Review" in the CommitFest
application, but I think it's fair to say that there's no consensus to
commit something along these lines. Accordingly, I'm going to mark it
"Returned with Feedback". There is clearly a need for better tooling
in this area, but I think there's a great deal of legitimate doubt
about whether this is the right solution to that problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
Now that the (at least as far as I know) last ordering problem in
pg_dump has been solved [1]http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7b583b20b1c95acb621c71251150beef958bb603, I'm going to attempt resurrecting this old
thread.
It seemed to me that the biggest objections to this patch in the old
discussions were directed at the implementation, which I have tried to
improve. The attached patch implements the actual splitting in a new
backup format.
The general output scheme looks like this:
schemaname/OBJECT_TYPES/object_name.sql,
but there are some exceptions.
Overloaded functions are dumped into the same file. Object names are
encoded into the POSIX Portable Filename Character Set ([a-z0-9._-]) by
replacing any characters outside that set with an underscore.
Restoring the dump is supported through an index.sql file containing
statements which include (through \i) the actual object files in the
dump directory.
Any thoughts? Objections on the idea or the implementation?
[1]: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7b583b20b1c95acb621c71251150beef958bb603
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7b583b20b1c95acb621c71251150beef958bb603
Regards,
Marko Tiikkaja
Attachments:
pg_dump_split_v2.patchtext/plain; charset=UTF-8; name=pg_dump_split_v2.patch; x-mac-creator=0; x-mac-type=0Download
*** a/src/bin/pg_dump/Makefile
--- b/src/bin/pg_dump/Makefile
***************
*** 19,25 **** include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
OBJS= pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \
! pg_backup_null.o pg_backup_tar.o \
pg_backup_directory.o dumpmem.o dumputils.o compress_io.o $(WIN32RES)
KEYWRDOBJS = keywords.o kwlookup.o
--- 19,25 ----
override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
OBJS= pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \
! pg_backup_null.o pg_backup_tar.o pg_backup_split.o \
pg_backup_directory.o dumpmem.o dumputils.o compress_io.o $(WIN32RES)
KEYWRDOBJS = keywords.o kwlookup.o
*** a/src/bin/pg_dump/pg_backup.h
--- b/src/bin/pg_dump/pg_backup.h
***************
*** 50,56 **** typedef enum _archiveFormat
archCustom = 1,
archTar = 3,
archNull = 4,
! archDirectory = 5
} ArchiveFormat;
typedef enum _archiveMode
--- 50,57 ----
archCustom = 1,
archTar = 3,
archNull = 4,
! archDirectory = 5,
! archSplit = 6
} ArchiveFormat;
typedef enum _archiveMode
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***************
*** 2125,2130 **** _allocAH(const char *FileSpec, const ArchiveFormat fmt,
--- 2125,2134 ----
case archTar:
InitArchiveFmt_Tar(AH);
break;
+
+ case archSplit:
+ InitArchiveFmt_Split(AH);
+ break;
default:
exit_horribly(modulename, "unrecognized file format \"%d\"\n", fmt);
*** a/src/bin/pg_dump/pg_backup_archiver.h
--- b/src/bin/pg_dump/pg_backup_archiver.h
***************
*** 369,374 **** extern void InitArchiveFmt_Custom(ArchiveHandle *AH);
--- 369,375 ----
extern void InitArchiveFmt_Null(ArchiveHandle *AH);
extern void InitArchiveFmt_Directory(ArchiveHandle *AH);
extern void InitArchiveFmt_Tar(ArchiveHandle *AH);
+ extern void InitArchiveFmt_Split(ArchiveHandle *AH);
extern bool isValidTarHeader(char *header);
*** /dev/null
--- b/src/bin/pg_dump/pg_backup_split.c
***************
*** 0 ****
--- 1,1063 ----
+ /*-------------------------------------------------------------------------
+ *
+ * pg_backup_split.c
+ *
+ * A split format dump is a directory, which contains all database objects
+ * separated into .sql files, and an "index.sql" file with psql statements
+ * to allow restoring the separated objects.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+ #include "postgres_fe.h"
+ #include "libpq-fe.h"
+ #include "libpq/libpq-fs.h"
+ #include "pg_backup_archiver.h"
+ #include "dumpmem.h"
+ #include "dumputils.h"
+
+ #include <dirent.h>
+ #include <sys/stat.h>
+
+ typedef struct
+ {
+ char *filename; /* filename excluding the directory (basename) */
+ DumpId dumpId; /* dump id of the TocEntry */
+ } lclTocEntry;
+
+ typedef struct
+ {
+ /*
+ * Our archive location. This is basically what the user specified as his
+ * backup file but of course here it is a directory.
+ */
+ char *directory;
+
+ FILE *dataFH; /* currently open data file */
+
+ lclTocEntry **sortedToc; /* array of toc entires sorted by (filename, dumpId) */
+ } lclContext;
+
+ /* translator: this is a module name */
+ static const char *modulename = gettext_noop("split archiver");
+
+
+ /* prototypes for private functions */
+ static void _ArchiveEntry(ArchiveHandle *AH, TocEntry *te);
+ static void _StartData(ArchiveHandle *AH, TocEntry *te);
+ static void _EndData(ArchiveHandle *AH, TocEntry *te);
+ static size_t _WriteData(ArchiveHandle *AH, const void *data, size_t dLen);
+ static size_t _WriteBuf(ArchiveHandle *AH, const void *buf, size_t len);
+ static void _CloseArchive(ArchiveHandle *AH);
+
+ static void _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
+ static size_t _WriteBlobData(ArchiveHandle *AH, const void *data, size_t dLen);
+ static void _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid);
+
+ static size_t _splitOut(ArchiveHandle *AH, const void *buf, size_t len);
+
+ static int lclTocEntryCmp(const void *av, const void *bv);
+ static bool should_add_index_entry(ArchiveHandle *AH, TocEntry *te);
+ static void create_sorted_toc(ArchiveHandle *AH);
+ static void get_object_description(ArchiveHandle *AH, TocEntry *te, PQExpBuffer buf);
+ static void add_ownership_information(ArchiveHandle *AH, TocEntry *te);
+ static void set_search_path(ArchiveHandle *AH, TocEntry *te);
+ static void write_split_directory(ArchiveHandle *AH);
+
+ static void create_schema_directory(ArchiveHandle *AH, const char *tag);
+ static void create_directory(ArchiveHandle *AH, const char *fmt, ...)
+ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+ static char *prepend_directory(ArchiveHandle *AH, const char *relativeFilename);
+ static char *encode_filename(const char *input);
+ static TocEntry *find_dependency(ArchiveHandle *AH, TocEntry *te);
+ static char *get_object_filename(ArchiveHandle *AH, TocEntry *t);
+
+
+ /*
+ * Init routine required by ALL formats. This is a global routine
+ * and should be declared in pg_backup_archiver.h
+ *
+ * Its task is to create any extra archive context (using AH->formatData),
+ * and to initialize the supported function pointers.
+ *
+ * It should also prepare whatever its input source is for reading/writing,
+ * and in the case of a read mode connection, it should load the Header & TOC.
+ */
+ void
+ InitArchiveFmt_Split(ArchiveHandle *AH)
+ {
+ lclContext *ctx;
+
+ /* Assuming static functions, this can be copied for each format. */
+ AH->ArchiveEntryPtr = _ArchiveEntry;
+ AH->StartDataPtr = _StartData;
+ AH->WriteDataPtr = _WriteData;
+ AH->EndDataPtr = _EndData;
+ AH->WriteBytePtr = NULL;
+ AH->ReadBytePtr = NULL;
+ AH->WriteBufPtr = _WriteBuf;
+ AH->ReadBufPtr = NULL;
+ AH->ClosePtr = _CloseArchive;
+ AH->ReopenPtr = NULL;
+ AH->PrintTocDataPtr = NULL;
+ AH->ReadExtraTocPtr = NULL;
+ AH->WriteExtraTocPtr = NULL;
+ AH->PrintExtraTocPtr = NULL;
+
+ AH->StartBlobsPtr = NULL;
+ AH->StartBlobPtr = _StartBlob;
+ AH->EndBlobPtr = _EndBlob;
+ AH->EndBlobsPtr = NULL;
+
+ AH->ClonePtr = NULL;
+ AH->DeClonePtr = NULL;
+
+ AH->CustomOutPtr = _splitOut;
+
+ /* Set up our private context */
+ ctx = (lclContext *) pg_malloc0(sizeof(lclContext));
+ AH->formatData = (void *) ctx;
+
+ ctx->dataFH = NULL;
+ ctx->sortedToc = NULL;
+
+ /* Initialize LO buffering */
+ AH->lo_buf_size = LOBBUFSIZE;
+ AH->lo_buf = (void *) pg_malloc(LOBBUFSIZE);
+
+ if (!AH->fSpec || strcmp(AH->fSpec, "") == 0)
+ exit_horribly(modulename, "no output directory specified\n");
+
+ if (AH->compression != 0)
+ exit_horribly(modulename, "split archive format does not support compression\n");
+
+ if (AH->mode != archModeWrite)
+ exit_horribly(modulename, "reading a split archive not supported; restore using psql\n");
+
+ ctx->directory = AH->fSpec;
+
+ if (mkdir(ctx->directory, 0700) < 0)
+ exit_horribly(modulename, "could not create directory \"%s\": %s\n",
+ ctx->directory, strerror(errno));
+
+ create_directory(AH, "EXTENSIONS");
+ create_directory(AH, "BLOBS");
+ }
+
+ /*
+ * Custom output function to write output from ahprintf() to ctx->dataFH.
+ */
+ static size_t
+ _splitOut(ArchiveHandle *AH, const void *buf, size_t len)
+ {
+ lclContext *ctx = (lclContext *) AH->formatData;
+
+ if (!ctx->dataFH)
+ exit_horribly(modulename, "ctx->dataFH is NULL\n");
+
+ return fwrite(buf, 1, len, ctx->dataFH);
+ }
+
+ static void
+ create_schema_directory(ArchiveHandle *AH, const char *tag)
+ {
+ char *namespace = encode_filename(tag);
+
+ create_directory(AH, "%s", namespace);
+ create_directory(AH, "%s/AGGREGATES", namespace);
+ create_directory(AH, "%s/CHECK_CONSTRAINTS", namespace);
+ create_directory(AH, "%s/CONSTRAINTS", namespace);
+ create_directory(AH, "%s/FK_CONSTRAINTS", namespace);
+ create_directory(AH, "%s/FUNCTIONS", namespace);
+ create_directory(AH, "%s/INDEXES", namespace);
+ create_directory(AH, "%s/OPERATOR_CLASSES", namespace);
+ create_directory(AH, "%s/OPERATOR_FAMILIES", namespace);
+ create_directory(AH, "%s/RULES", namespace);
+ create_directory(AH, "%s/SEQUENCES", namespace);
+ create_directory(AH, "%s/SERVERS", namespace);
+ create_directory(AH, "%s/TABLEDATA", namespace);
+ create_directory(AH, "%s/TABLES", namespace);
+ create_directory(AH, "%s/TYPES", namespace);
+ create_directory(AH, "%s/TRIGGERS", namespace);
+ create_directory(AH, "%s/VIEWS", namespace);
+ }
+
+ /*
+ * Called by the Archiver when the dumper creates a new TOC entry.
+ *
+ * We determine the filename for this entry.
+ */
+ static void
+ _ArchiveEntry(ArchiveHandle *AH, TocEntry *te)
+ {
+ lclTocEntry *tctx;
+
+ tctx = (lclTocEntry *) pg_malloc0(sizeof(lclTocEntry));
+ tctx->dumpId = te->dumpId;
+ te->formatData = (void *) tctx;
+
+ tctx->filename = get_object_filename(AH, te);
+ }
+
+
+ /*
+ * Called by the archiver when saving TABLE DATA (not schema). This routine
+ * should save whatever format-specific information is needed to read
+ * the archive back.
+ *
+ * It is called just prior to the dumper's 'DataDumper' routine being called.
+ *
+ * We create the data file for writing and add any information necessary
+ * for restoring the table data.
+ */
+ static void
+ _StartData(ArchiveHandle *AH, TocEntry *te)
+ {
+ lclTocEntry *tctx = (lclTocEntry *) te->formatData;
+ lclContext *ctx = (lclContext *) AH->formatData;
+ char *fname;
+
+ fname = prepend_directory(AH, tctx->filename);
+
+ ctx->dataFH = fopen(fname, PG_BINARY_W);
+ if (ctx->dataFH == NULL)
+ exit_horribly(modulename, "could not open output file \"%s\": %s\n",
+ fname, strerror(errno));
+
+ /* set the search path */
+ set_search_path(AH, te);
+
+ /*
+ * If there's a COPY statement, add it to the beginning of the file. If there
+ * isn't one, this must be a --inserts dump and we don't need to add anything.
+ */
+ if (te->copyStmt)
+ ahprintf(AH, "%s", te->copyStmt);
+ }
+
+ /*
+ * Called by archiver when dumper calls WriteData. Note that while the
+ * WriteData routine is generally called for both BLOB and TABLE data, we
+ * substitute our own _WriteBlob function when dealing with BLOBs.
+ *
+ * We write the data to the open data file.
+ */
+ static size_t
+ _WriteData(ArchiveHandle *AH, const void *data, size_t dLen)
+ {
+ lclContext *ctx = (lclContext *) AH->formatData;
+
+ if (dLen == 0)
+ return 0;
+
+ return fwrite(data, 1, dLen, ctx->dataFH);
+ }
+
+ /*
+ * Called by the archiver when a dumper's 'DataDumper' routine has
+ * finished.
+ *
+ * We close the data file.
+ */
+ static void
+ _EndData(ArchiveHandle *AH, TocEntry *te)
+ {
+ lclContext *ctx = (lclContext *) AH->formatData;
+
+ /* Close the file */
+ fclose(ctx->dataFH);
+
+ ctx->dataFH = NULL;
+ }
+
+ /*
+ * Write a buffer of data to the archive.
+ * Called by the archiver to write a block of bytes to a data file.
+ */
+ static size_t
+ _WriteBuf(ArchiveHandle *AH, const void *buf, size_t len)
+ {
+ lclContext *ctx = (lclContext *) AH->formatData;
+ size_t res;
+
+ res = fwrite(buf, 1, len, ctx->dataFH);
+ if (res != len)
+ exit_horribly(modulename, "could not write to output file: %s\n",
+ strerror(errno));
+
+ return res;
+ }
+
+ /*
+ * Close the archive.
+ *
+ * When writing the archive, this is the routine that actually starts
+ * the process of saving it to files. No data should be written prior
+ * to this point, since the user could sort the TOC after creating it.
+ *
+ * Usually when an archive is written, we should call WriteHead() and
+ * WriteToc(). But since we don't write a TOC file at all, we can just
+ * skip that and write the index file from the TocEntry array. We do,
+ * however, use WriteDataChunks() to write the table data.
+ */
+ static void
+ _CloseArchive(ArchiveHandle *AH)
+ {
+ if (AH->mode == archModeWrite)
+ {
+ WriteDataChunks(AH);
+ write_split_directory(AH);
+ }
+ }
+
+
+ /*
+ * BLOB support
+ */
+
+ /*
+ * Called by the archiver when we're about to start dumping a blob.
+ *
+ * We create a file to write the blob to.
+ */
+ static void
+ _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
+ {
+ lclContext *ctx = (lclContext *) AH->formatData;
+ char fname[MAXPGPATH];
+
+ snprintf(fname, MAXPGPATH, "%s/BLOBS/%u.sql", ctx->directory, oid);
+ ctx->dataFH = fopen(fname, PG_BINARY_W);
+ if (ctx->dataFH == NULL)
+ exit_horribly(modulename, "could not open output file \"%s\": %s\n",
+ fname, strerror(errno));
+
+ ahprintf(AH, "SELECT pg_catalog.lo_open('%u', %d);\n", oid, INV_WRITE);
+
+ /* Substitute a different function to deal with BLOB data */
+ AH->WriteDataPtr = _WriteBlobData;
+ }
+
+ /*
+ * Called by dumper via archiver from within a data dump routine.
+ * We substitute this for _WriteData while emitting a BLOB.
+ */
+ static size_t
+ _WriteBlobData(ArchiveHandle *AH, const void *data, size_t dLen)
+ {
+ if (dLen > 0)
+ {
+ PQExpBuffer buf = createPQExpBuffer();
+ appendByteaLiteralAHX(buf,
+ (const unsigned char *) data,
+ dLen,
+ AH);
+
+ ahprintf(AH, "SELECT pg_catalog.lowrite(0, %s);\n", buf->data);
+ destroyPQExpBuffer(buf);
+ }
+
+ return dLen;
+ }
+
+ /*
+ * Called by the archiver when the dumper is finished writing a blob.
+ *
+ * We close the blob file and write an entry to the blob TOC file for it.
+ */
+ static void
+ _EndBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
+ {
+ lclContext *ctx = (lclContext *) AH->formatData;
+
+ ahprintf(AH, "SELECT pg_catalog.lo_close(0);\n\n");
+
+ /* Close the BLOB data file itself */
+ fclose(ctx->dataFH);
+ ctx->dataFH = NULL;
+
+ /* Restore the pointer we substituted in _StartBlob() */
+ AH->WriteDataPtr = _WriteData;
+ }
+
+
+ static int
+ lclTocEntryCmp(const void *av, const void *bv)
+ {
+ int c;
+ lclTocEntry *a = *((lclTocEntry **) av);
+ lclTocEntry *b = *((lclTocEntry **) bv);
+
+ /* NULLs should sort last */
+ c = (b->filename != NULL) - (a->filename != NULL);
+ if (c != 0)
+ return c;
+
+ /* don't call strcmp() on NULLs */
+ if (a->filename != NULL && b->filename != NULL)
+ {
+ c = strcmp(a->filename, b->filename);
+ if (c != 0)
+ return c;
+ }
+
+ return a->dumpId - b->dumpId;
+ }
+
+ static bool
+ should_add_index_entry(ArchiveHandle *AH, TocEntry *te)
+ {
+ lclTocEntry **sortedToc;
+ lclTocEntry **pte;
+ lclTocEntry **key;
+ lclTocEntry *prevte;
+
+ key = (lclTocEntry **) &te->formatData;
+
+ sortedToc = ((lclContext *) AH->formatData)->sortedToc;
+ if (!sortedToc)
+ exit_horribly(modulename, "formatData->sortedToc is NULL\n");
+
+ pte = (lclTocEntry **) bsearch(key, sortedToc,
+ AH->tocCount, sizeof(lclTocEntry *), lclTocEntryCmp);
+
+ if (!pte)
+ exit_horribly(modulename, "binary search failed\n");
+
+ /* If there's no previous entry, always add an index entry */
+ if (pte == sortedToc)
+ return true;
+
+ /*
+ * If there's a previous entry with the same filename, we don't want to add
+ * an index entry for this TocEntry. Note that NULLs sort last so the
+ * previous entry's filename can never be NULL.
+ */
+ prevte = *(pte - 1);
+ return strcmp(prevte->filename, (*key)->filename) != 0;
+ }
+
+ /*
+ * Create a list of lclTocEntries sorted by (filename, dumpId). This list is
+ * used when creating the index file to make sure we don't include a file
+ * multiple times.
+ */
+ static void
+ create_sorted_toc(ArchiveHandle *AH)
+ {
+ int i;
+ lclContext *ctx;
+ TocEntry *te;
+
+ ctx = (lclContext *) AH->formatData;
+ /* sanity checks */
+ if (!ctx)
+ exit_horribly(modulename, "formatData not allocated\n");
+ if (ctx->sortedToc != NULL)
+ exit_horribly(modulename, "formatData->sortedToc not NULL\n");
+
+ ctx->sortedToc = (lclTocEntry **) pg_malloc0(sizeof(lclTocEntry *) * AH->tocCount);
+ for (i = 0, te = AH->toc->next; te != AH->toc; i++, te = te->next)
+ ctx->sortedToc[i] = (lclTocEntry *) te->formatData;
+
+ qsort(ctx->sortedToc, AH->tocCount, sizeof(lclTocEntry *), lclTocEntryCmp);
+ }
+
+ static void
+ get_object_description(ArchiveHandle *AH, TocEntry *te, PQExpBuffer buf)
+ {
+ const char *type = te->desc;
+
+ /* use ALTER TABLE for views and sequences */
+ if (strcmp(type, "VIEW") == 0 || strcmp(type, "SEQUENCE") == 0)
+ type = "TABLE";
+
+ /* must not call fmtId() on BLOBs */
+ if (strcmp(type, "BLOB") == 0)
+ {
+ appendPQExpBuffer(buf, "LARGE OBJECT %s ", te->tag);
+ return;
+ }
+
+ /* a number of objects that require no special treatment */
+ if (strcmp(type, "COLLATION") == 0 ||
+ strcmp(type, "CONVERSION") == 0 ||
+ strcmp(type, "DOMAIN") == 0 ||
+ strcmp(type, "DATABASE") == 0 ||
+ strcmp(type, "FOREIGN DATA WRAPPER") == 0 ||
+ strcmp(type, "FOREIGN TABLE") == 0 ||
+ strcmp(type, "INDEX") == 0 ||
+ strcmp(type, "LARGE OBJECT") == 0 ||
+ strcmp(type, "TABLE") == 0 ||
+ strcmp(type, "TEXT SEARCH CONFIGURATION") == 0 ||
+ strcmp(type, "TEXT SEARCH DICTIONARY") == 0 ||
+ strcmp(type, "TYPE") == 0 ||
+ strcmp(type, "PROCEDURAL LANGUAGE") == 0 ||
+ strcmp(type, "SCHEMA") == 0 ||
+ strcmp(type, "SERVER") == 0 ||
+ strcmp(type, "USER MAPPING") == 0)
+ {
+ appendPQExpBuffer(buf, "%s ", type);
+ if (te->namespace)
+ appendPQExpBuffer(buf, "%s.", fmtId(te->namespace));
+ appendPQExpBuffer(buf, "%s ", fmtId(te->tag));
+
+ return;
+ }
+
+ /*
+ * These object types require additional decoration. Fortunately, the
+ * information needed is exactly what's in the DROP command.
+ */
+ if (strcmp(type, "AGGREGATE") == 0 ||
+ strcmp(type, "FUNCTION") == 0 ||
+ strcmp(type, "OPERATOR") == 0 ||
+ strcmp(type, "OPERATOR CLASS") == 0 ||
+ strcmp(type, "OPERATOR FAMILY") == 0)
+ {
+ /* chop "DROP " off the front and make a modifyable copy */
+ char *first = pg_strdup(te->dropStmt + 5);
+ char *last;
+
+ /* strip off any ';' or '\n' at the end */
+ last = first + strlen(first) - 1;
+ while (last >= first && (*last == '\n' || *last == ';'))
+ last--;
+ *(last + 1) = '\0';
+
+ appendPQExpBuffer(buf, "%s ", first);
+
+ free(first);
+
+ return;
+ }
+
+ exit_horribly(modulename, "don't know how to set owner for object type %s\n", type);
+ }
+
+ static void
+ add_ownership_information(ArchiveHandle *AH, TocEntry *te)
+ {
+ PQExpBuffer temp;
+
+ /* skip objects that don't have an owner */
+ if (strcmp(te->desc, "ACL") == 0 ||
+ strcmp(te->desc, "CAST") == 0 ||
+ strcmp(te->desc, "COMMENT") == 0 ||
+ strcmp(te->desc, "CHECK CONSTRAINT") == 0 ||
+ strcmp(te->desc, "CONSTRAINT") == 0 ||
+ strcmp(te->desc, "DEFAULT") == 0 ||
+ strcmp(te->desc, "ENCODING") == 0 ||
+ strcmp(te->desc, "EXTENSION") == 0 ||
+ strcmp(te->desc, "FK CONSTRAINT") == 0 ||
+ strcmp(te->desc, "LARGE OBJECT") == 0 ||
+ strcmp(te->desc, "RULE") == 0 ||
+ strcmp(te->desc, "SEQUENCE OWNED BY") == 0 ||
+ strcmp(te->desc, "SEQUENCE SET") == 0 ||
+ strcmp(te->desc, "STDSTRINGS") == 0 ||
+ strcmp(te->desc, "TRIGGER") == 0)
+ return;
+
+ temp = createPQExpBuffer();
+ appendPQExpBuffer(temp, "ALTER ");
+ get_object_description(AH, te, temp);
+ appendPQExpBuffer(temp, "OWNER TO %s;", fmtId(te->owner));
+ ahprintf(AH, "%s\n\n", temp->data);
+ destroyPQExpBuffer(temp);
+ }
+
+ static void
+ set_search_path(ArchiveHandle *AH, TocEntry *te)
+ {
+ if (!te->namespace)
+ return;
+
+ /*
+ * We want to add the namespace to information to each object regardless
+ * of the previous object's namespace; that way it is easy to see when an
+ * object is moved to another schema.
+ */
+ if (strcmp(te->namespace, "pg_catalog") == 0)
+ ahprintf(AH, "SET search_path TO pg_catalog;\n\n");
+ else
+ ahprintf(AH, "SET search_path TO '%s', pg_catalog;\n\n", te->namespace);
+ }
+
+ /*
+ * Majority of the work is done here. We scan through the list of TOC entries
+ * and write the object definitions into their respective files. At the same
+ * time, we build the "index" file.
+ */
+ static void
+ write_split_directory(ArchiveHandle *AH)
+ {
+ lclContext *ctx;
+ TocEntry *te;
+ FILE *indexFH;
+ char buf[512];
+
+ ctx = (lclContext *) AH->formatData;
+
+ create_sorted_toc(AH);
+
+ indexFH = fopen(prepend_directory(AH, "index.sql"), "w");
+ if (!indexFH)
+ exit_horribly(modulename, "could not open index.sql: %s\n", strerror(errno));
+
+ snprintf(buf, sizeof(buf), "\n-- PostgreSQL split database dump\n\n"
+ "BEGIN;\n"
+ "SET client_min_messages TO 'warning';\n"
+ "SET client_encoding TO '%s';\n"
+ "SET check_function_bodies TO false;\n\n",
+ pg_encoding_to_char(AH->public.encoding));
+ if (fwrite(buf, 1, strlen(buf), indexFH) != strlen(buf))
+ exit_horribly(modulename, "could not write to index file: %s\n", strerror(errno));
+
+ for (te = AH->toc->next; te != AH->toc; te = te->next)
+ {
+ lclTocEntry *tctx;
+ const char *filename;
+
+ tctx = (lclTocEntry *) te->formatData;
+
+ /* for TABLEDATA, the only thing we need to do is add an index entry */
+ if (strcmp(te->desc, "TABLE DATA") == 0)
+ {
+ snprintf(buf, sizeof(buf), "\\i %s\n", tctx->filename);
+ if (fwrite(buf, 1, strlen(buf), indexFH) != strlen(buf))
+ exit_horribly(modulename, "could not write index file: %s\n", strerror(errno));
+ continue;
+ }
+
+ /* skip data */
+ if (te->dataDumper)
+ continue;
+
+ /* if there's no filename we need to skip this entry, see _ArchiveEntry() */
+ if (!tctx->filename)
+ continue;
+
+ /* add an index entry if necessary */
+ if (should_add_index_entry(AH, te))
+ {
+ snprintf(buf, sizeof(buf), "\\i %s\n", tctx->filename);
+ if (fwrite(buf, 1, strlen(buf), indexFH) != strlen(buf))
+ exit_horribly(modulename, "could not write index file: %s\n", strerror(errno));
+ }
+
+ /*
+ * Special case: don't try to re-create the "public" schema. Note that we
+ * still need to create the index entry because all schemas use the same
+ * "schemaless.sql" file, so make sure that happens before we reach this
+ * point.
+ */
+ if (strcmp(te->desc, "SCHEMA") == 0 &&
+ strcmp(te->tag, "public") == 0)
+ continue;
+
+ filename = prepend_directory(AH, tctx->filename);
+
+ /* multiple objects could map to the same file, so open in "append" mode */
+ ctx->dataFH = fopen(filename, "a");
+ if (!ctx->dataFH)
+ exit_horribly(modulename, "could not open file \"%s\": %s\n",
+ filename, strerror(errno));
+
+ set_search_path(AH, te);
+
+ ahprintf(AH, "%s\n", te->defn);
+
+ /*
+ * Special case: add \i for BLOBs into the "blobs.sql" data file. It's ugly
+ * to have this here, but there really isn't any better place.
+ */
+ if (strcmp(te->desc, "BLOB") == 0)
+ ahprintf(AH, "\\i BLOBS/%s.sql\n\n", te->tag);
+
+ add_ownership_information(AH, te);
+
+ fclose(ctx->dataFH);
+ ctx->dataFH = NULL;
+ }
+
+ if (fwrite("COMMIT;\n", 1, 8, indexFH) != 8)
+ exit_horribly(modulename, "could not write index file: %s\n", strerror(errno));
+
+ fclose(indexFH);
+ }
+
+
+ static void
+ create_directory(ArchiveHandle *AH, const char *fmt, ...)
+ {
+ va_list ap;
+ char reldir[MAXPGPATH];
+ char *directory;
+
+ va_start(ap, fmt);
+ vsnprintf(reldir, MAXPGPATH, fmt, ap);
+ va_end(ap);
+
+ directory = prepend_directory(AH, reldir);
+ if (mkdir(directory, 0700) < 0)
+ exit_horribly(modulename, "could not create directory \"%s\": %s\n",
+ directory, strerror(errno));
+ }
+
+
+ static char *
+ prepend_directory(ArchiveHandle *AH, const char *relativeFilename)
+ {
+ lclContext *ctx = (lclContext *) AH->formatData;
+ static char buf[MAXPGPATH];
+ char *dname;
+
+ dname = ctx->directory;
+
+ if (strlen(dname) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)
+ exit_horribly(modulename, "file name too long: \"%s\"\n", dname);
+
+ strcpy(buf, dname);
+ strcat(buf, "/");
+ strcat(buf, relativeFilename);
+
+ return buf;
+ }
+
+
+ /*
+ * Encode a filename to fit in the "Portable Filename Character Set" in POSIX.
+ *
+ * Any character not part of that set will be replaced with '_'. Also, because
+ * some file system are case insensitive, we need to lower-case all file names.
+ *
+ * Because we don't know what encoding the data is in, if we see multiple
+ * consecutive octets outside the set, we only output one underscore
+ * representing all of them. That way one can easily compare the outputs of
+ * dumps taken on systems with different encoding.
+ */
+ static char *
+ encode_filename(const char *input)
+ {
+ static char buf[MAXPGPATH];
+ const char *p = input;
+ char *op = buf;
+ bool replaced_previous;
+
+ /*
+ * The input filename should be at most 64 bytes (because it comes from the
+ * "name" datatype), so this should never happen.
+ */
+ if (strlen(input) >= MAXPGPATH)
+ exit_horribly(modulename, "file name too long: \"%s\"\n", input);
+
+ for (replaced_previous = false;;)
+ {
+ if (*p == 0)
+ break;
+
+ if (*p >= 'A' && *p <= 'Z')
+ {
+ *op++ = tolower(*p);
+ replaced_previous = false;
+ }
+ else if ((*p >= 'a' && *p <= 'z') ||
+ (*p >= '0' && *p <= '9') ||
+ *p == '.' || *p == '_' || *p == '-')
+ {
+ *op++ = *p;
+ replaced_previous = false;
+ }
+ else if (!replaced_previous)
+ {
+ *op++ = '_';
+ replaced_previous = true;
+ }
+
+ p++;
+ }
+
+ *op = '\0';
+
+ return buf;
+ }
+
+ /*
+ * Given a pointer to the start of an identifier, returns a pointer to one
+ * character past that identifier, or NULL if no valid identifier was found.
+ * Also we don't remove any escaped quotes inside quoted identifiers, so the
+ * caller should be prepared to deal with that. In the (currently) only use of
+ * this function it won't matter, since double quotes will be replaced with a
+ * single underscore when encoding the filename.
+ */
+ static char *
+ skip_identifier(char *buf)
+ {
+ char *p = buf;
+ bool quoted = false;
+
+ if (*p == '"')
+ quoted = true;
+ /* without quotes, the first character needs special treatment */
+ else if (!((*p >= 'a' && *p <= 'z') || *p == '_'))
+ return NULL;
+ p++;
+
+ for (;;)
+ {
+ /*
+ * If we're parsing a quoted identifier, stop at a quote unless it's escaped.
+ * Also make sure we don't go past the end of the string.
+ *
+ * Or if we're not parsing a quoted identifier, stop whenever we encounter
+ * any character which would require quotes. Note that we don't care what
+ * the character is; it's up to the caller to see whether it makes sense to
+ * have that character in there.
+ */
+ if (quoted)
+ {
+ if (*p == '"')
+ {
+ p++;
+ if (*p != '"')
+ return p;
+ }
+ else if (*p == '\0')
+ return NULL;
+ }
+ else if (!((*p >= 'a' && *p <= 'z') ||
+ (*p >= '0' && *p <= '9') ||
+ (*p == '_')))
+ return p;
+
+ p++;
+ }
+ }
+
+ static TocEntry *
+ find_dependency(ArchiveHandle *AH, TocEntry *te)
+ {
+ DumpId depId;
+ TocEntry *depte;
+
+ if (te->nDeps != 1)
+ exit_horribly(modulename, "unexpected number of dependencies (%d) for \"%s\" %d\n", te->nDeps, te->desc, te->dumpId);
+
+ depId = te->dependencies[0];
+
+ for (depte = te->prev; depte != te; depte = depte->prev)
+ {
+ if (depte->dumpId == depId)
+ return depte;
+ }
+
+ exit_horribly(modulename, "could not find dependency %d for \"%s\" %d\n", depId, te->desc, te->dumpId);
+ }
+
+ static char *
+ get_object_filename(ArchiveHandle *AH, TocEntry *te)
+ {
+ int i;
+ char path[MAXPGPATH];
+
+ /*
+ * List of object types we can simply dump into [schema/]OBJTYPE/tag.sql.
+ * The first argument is the object type (te->desc) and the second one is
+ * the subdirectory to dump to.
+ */
+ const char * const object_types[][2] =
+ {
+ { "AGGREGATE", "AGGREGATES" },
+ { "CHECK CONSTRAINT", "CHECK_CONSTRAINTS" },
+ { "CONSTRAINT", "CONSTRAINTS" },
+ { "EXTENSION", "EXTENSIONS" },
+ { "FK CONSTRAINT", "FK_CONSTRAINTS" },
+ { "INDEX", "INDEXES" },
+ { "SEQUENCE", "SEQUENCES" },
+ { "OPERATOR CLASS", "OPERATOR_CLASSES" },
+ { "OPERATOR FAMILY", "OPERATOR_FAMILIES" },
+ { "RULE", "RULES" },
+ { "SERVER", "SERVERS" },
+ { "TABLE", "TABLES" },
+ { "TYPE", "TYPES" },
+ { "TRIGGER", "TRIGGERS" },
+ { "VIEW", "VIEWS" }
+ };
+
+
+ if (te->dataDumper)
+ {
+ if (strcmp(te->desc, "TABLE DATA") == 0)
+ {
+ snprintf(path, MAXPGPATH, "%s/TABLEDATA/%d.sql", encode_filename(te->namespace), te->dumpId);
+ return pg_strdup(path);
+ }
+ else if (strcmp(te->desc, "BLOBS") == 0)
+ {
+ /*
+ * Return NULL for BLOBS. The _*_Blob functions will know how to find the
+ * correct files -- we don't since we don't know the oids yet.
+ */
+ return NULL;
+ }
+
+ exit_horribly(modulename, "unknown data object %s\n", te->desc);
+ }
+
+ /*
+ * There's no need to create a database; one should always exist when
+ * restoring.
+ */
+ if (strcmp(te->desc, "DATABASE") == 0)
+ return NULL;
+
+ if (strcmp(te->desc, "BLOB") == 0)
+ {
+ snprintf(path, MAXPGPATH, "blobs.sql");
+ return pg_strdup(path);
+ }
+
+ /* for schemas, create the directory before dumping the definition */
+ if (strcmp(te->desc, "SCHEMA") == 0)
+ create_schema_directory(AH, te->tag);
+
+ /* schemaless objects which don't depend on anything */
+ if (strcmp(te->desc, "COLLATION") == 0 ||
+ strcmp(te->desc, "ENCODING") == 0 ||
+ strcmp(te->desc, "PROCEDURAL LANGUAGE") == 0 ||
+ strcmp(te->desc, "SCHEMA") == 0 ||
+ strcmp(te->desc, "STDSTRINGS") == 0)
+ return pg_strdup("schemaless.sql");
+
+ /*
+ * These objects depend on other objects so they can't be put into
+ * schemaless.sql.
+ */
+ if (strcmp(te->desc, "CAST") == 0)
+ return pg_strdup("casts.sql");
+ if (strcmp(te->desc, "CONVERSION") == 0)
+ return pg_strdup("conversions.sql");
+ if (strcmp(te->desc, "DEFAULT") == 0)
+ return pg_strdup("defaults.sql");
+ if (strcmp(te->desc, "USER MAPPING") == 0)
+ return pg_strdup("user_mappings.sql");
+ if (strcmp(te->desc, "OPERATOR") == 0)
+ {
+ snprintf(path, MAXPGPATH, "%s/operators.sql", encode_filename(te->namespace));
+ return pg_strdup(path);
+ }
+
+ /*
+ * These objects should always contain dependency information. Find the
+ * object we depend te depends on, and dump them to the same file.
+ */
+ if (strcmp(te->desc, "ACL") == 0 ||
+ strcmp(te->desc, "SEQUENCE SET") == 0 ||
+ strcmp(te->desc, "SEQUENCE OWNED BY") == 0 ||
+ strcmp(te->desc, "COMMENT") == 0)
+ {
+ TocEntry *depte;
+ lclTocEntry *depctx;
+
+ depte = find_dependency(AH, te);
+ depctx = (lclTocEntry *) depte->formatData;
+ if (!depctx)
+ exit_horribly(modulename, "unexpected NULL formatData\n");
+
+ /* no need to strdup() */
+ return depctx->filename;
+ }
+
+ if (strcmp(te->desc, "AGGREGATE") == 0 ||
+ strcmp(te->desc, "FUNCTION") == 0)
+ {
+ char *buf;
+ char *proname;
+ char *p;
+ char *fname;
+ PQExpBuffer temp;
+
+ /*
+ * Parse the actual function/aggregate name from the DROP statement. This is
+ * easier than parsing it from the tag since the object name is never quoted
+ * inside the tag so we can't reliably tell where the argument list begins.
+ */
+ if (strncmp(te->dropStmt, "DROP FUNCTION ", 14) == 0)
+ buf = pg_strdup(te->dropStmt + 14);
+ else if (strncmp(te->dropStmt, "DROP AGGREGATE ", 15) == 0)
+ buf = pg_strdup(te->dropStmt + 15);
+ else
+ exit_horribly(modulename, "could not parse DROP statement \"%s\"\n", te->dropStmt);
+
+ proname = buf;
+
+ p = skip_identifier(proname);
+ if (!p)
+ exit_horribly(modulename, "could not parse DROP statement \"%s\"\n", te->dropStmt);
+
+ /*
+ * If there's a namespace, ignore it and find the end of the next identifier.
+ * That should be the name of the function.
+ */
+ if (*p == '.')
+ {
+ proname = p + 1;
+ p = skip_identifier(proname);
+ if (!p)
+ exit_horribly(modulename, "could not parse DROP statement \"%s\"\n", te->dropStmt);
+ }
+
+ /* the argument list should be right after the function name */
+ if (*p != '(')
+ exit_horribly(modulename, "could not parse DROP statement \"%s\"\n", te->dropStmt);
+
+ /*
+ * Terminate the identifier before the argument list definition, removing
+ * quotes if necessary.
+ */
+ if (*proname == '"')
+ {
+ proname++;
+ *(p-1) = '\0';
+ }
+ else
+ *p = '\0';
+
+ temp = createPQExpBuffer();
+ /* must use 2 steps here because encode_filename() is nonreentrant */
+ appendPQExpBuffer(temp, "%s/", encode_filename(te->namespace));
+ appendPQExpBuffer(temp, "%sS/%s.sql", te->desc, encode_filename(proname));
+
+ fname = pg_strdup(temp->data);
+
+ destroyPQExpBuffer(temp);
+ free(buf);
+
+ return fname;
+ }
+
+ /* finally, see if it's any of the objects that require no special treatment */
+ for (i = 0; i < sizeof(object_types) / sizeof(object_types[0]); ++i)
+ {
+ if (strcmp(object_types[i][0], te->desc) == 0)
+ {
+ const char *objsubdir = object_types[i][1];
+
+ if (te->namespace)
+ {
+ /* must use 2 steps here because encode_filename() is nonreentrant */
+ char *namespace = pg_strdup(encode_filename(te->namespace));
+ snprintf(path, MAXPGPATH, "%s/%s/%s.sql", namespace,
+ objsubdir, encode_filename(te->tag));
+ free(namespace);
+ }
+ else
+ snprintf(path, MAXPGPATH, "%s/%s.sql",
+ objsubdir, encode_filename(te->tag));
+
+ return pg_strdup(path);
+ }
+ }
+
+ exit_horribly(modulename, "unknown object type \"%s\"\n", te->desc);
+ }
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 806,813 **** help(const char *progname)
printf(_("\nGeneral options:\n"));
printf(_(" -f, --file=FILENAME output file or directory name\n"));
! printf(_(" -F, --format=c|d|t|p output file format (custom, directory, tar,\n"
! " plain text (default))\n"));
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
--- 806,813 ----
printf(_("\nGeneral options:\n"));
printf(_(" -f, --file=FILENAME output file or directory name\n"));
! printf(_(" -F, --format=c|d|t|s|p output file format (custom, directory, tar,\n"
! " split directory, plain text (default))\n"));
printf(_(" -v, --verbose verbose mode\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -Z, --compress=0-9 compression level for compressed formats\n"));
***************
*** 957,962 **** parseArchiveFormat(const char *format, ArchiveMode *mode)
--- 957,966 ----
archiveFormat = archTar;
else if (pg_strcasecmp(format, "tar") == 0)
archiveFormat = archTar;
+ else if (pg_strcasecmp(format, "s") == 0)
+ archiveFormat = archSplit;
+ else if (pg_strcasecmp(format, "split") == 0)
+ archiveFormat = archSplit;
else
exit_horribly(NULL, "invalid output format \"%s\" specified\n", format);
return archiveFormat;
Hi,
Marko Tiikkaja <pgmail@joh.to> writes:
The general output scheme looks like this:
schemaname/OBJECT_TYPES/object_name.sql,
I like this feature, I actually did have to code it myself in the past
and several other people did so, so we already have at least 3 copies of
`getddl` variants around. I really think this feature should be shipped
by default with PostgreSQL.
I don't much care for the all uppercase formating of object type
directories in your patch though.
Overloaded functions are dumped into the same file. Object names are
encoded into the POSIX Portable Filename Character Set ([a-z0-9._-]) by
replacing any characters outside that set with an underscore.
What happens if you have a table foo and another table "FoO"?
Restoring the dump is supported through an index.sql file containing
statements which include (through \i) the actual object files in the dump
directory.
I think we should be using \ir now that we have that.
Any thoughts? Objections on the idea or the implementation?
As far as the implementation goes, someone with more experience on the
Archiver Handles should have a look. To me, it looks like you are trying
to shoehorn your feature in the current API and that doesn't feel good.
The holly grail here that we've been speaking about in the past would be
to separate out tooling and formats so that we have:
pg_dump | pg_restore
pg_export | psql
In that case we would almost certainly need libpgdump to share the code,
and we maybe could implement a binary output option for pg_dump too
(yeah, last time it was proposed we ended up with bytea_output = 'hex').
That libpgdump idea basically means we won't have the --split feature in
9.3, and that's really bad, as we already are some releases late on
delivering that, in my opinion.
Maybe the pg_export and pg_dump tool could share code by just #include
magic rather than a full blown lib in a first incantation?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi,
On 16/11/2012 15:52, Dimitri Fontaine wrote:
Marko Tiikkaja <pgmail@joh.to> writes:
The general output scheme looks like this:
schemaname/OBJECT_TYPES/object_name.sql,I like this feature, I actually did have to code it myself in the past
and several other people did so, so we already have at least 3 copies of
`getddl` variants around. I really think this feature should be shipped
by default with PostgreSQL.I don't much care for the all uppercase formating of object type
directories in your patch though.
*shrug* I have no real preference to one way or the other.
Overloaded functions are dumped into the same file. Object names are
encoded into the POSIX Portable Filename Character Set ([a-z0-9._-]) by
replacing any characters outside that set with an underscore.What happens if you have a table foo and another table "FoO"?
They would go to the same file. If you think there are technical issues
behind that decision (e.g. the dump would not restore), I would like to
hear an example case.
On the other hand, some people might find it preferrable to have them in
different files (for example foo, foo.1, foo.2 etc). Or some might
prefer some other naming scheme. One of the problems with this patch is
exactly that people prefer different things, and providing switches for
all of the different options people come up with would mean a lot of
switches. :-(
Restoring the dump is supported through an index.sql file containing
statements which include (through \i) the actual object files in the dump
directory.I think we should be using \ir now that we have that.
Good point, will have to get that fixed.
Any thoughts? Objections on the idea or the implementation?
As far as the implementation goes, someone with more experience on the
Archiver Handles should have a look. To me, it looks like you are trying
to shoehorn your feature in the current API and that doesn't feel good.
It feels a bit icky to me too, but I didn't feel comfortable with
putting in a lot of work to refactor the API because of how
controversial this feature is.
The holly grail here that we've been speaking about in the past would be
to separate out tooling and formats so that we have:pg_dump | pg_restore
pg_export | psqlIn that case we would almost certainly need libpgdump to share the code,
and we maybe could implement a binary output option for pg_dump too
(yeah, last time it was proposed we ended up with bytea_output = 'hex').
While I agree that this idea - when implemented - would be nicer in
practically every way, I'm not sure I want to volunteer to do all the
necessary work.
That libpgdump idea basically means we won't have the --split feature in
9.3, and that's really bad, as we already are some releases late on
delivering that, in my opinion.Maybe the pg_export and pg_dump tool could share code by just #include
magic rather than a full blown lib in a first incantation?
That's one idea..
Regards,
Marko Tiikkaja
Marko Tiikkaja <pgmail@joh.to> writes:
What happens if you have a table foo and another table "FoO"?
They would go to the same file. If you think there are technical issues
behind that decision (e.g. the dump would not restore), I would like to hear
an example case.
I didn't try the patch itself yet so I wanted to hear that's something
you did actually try out, that's about it :) As soon as we're past the
initial agreement on this patch landing in (which I really want to see
happen), I'll devote some time on testing it.
On the other hand, some people might find it preferrable to have them in
different files (for example foo, foo.1, foo.2 etc). Or some might prefer
some other naming scheme. One of the problems with this patch is exactly
that people prefer different things, and providing switches for all of the
different options people come up with would mean a lot of switches. :-(
I think this facility should provide something simple and useful, and
not something tasty. Database backups and exports are not meant to cater
with taste, they are meant to be easy to restore. In that very case, we
want to have a set of properly organized SQL files for doing partial
restores, right?
It feels a bit icky to me too, but I didn't feel comfortable with putting in
a lot of work to refactor the API because of how controversial this feature
is.
+1
pg_dump | pg_restore
pg_export | psqlWhile I agree that this idea - when implemented - would be nicer in
practically every way, I'm not sure I want to volunteer to do all the
necessary work.
What I think needs to happen now is a commiter's buy in that we want to
get there at some point and that your current patch is not painting us
into any corner now. So that we can accept it and have a documented path
forward.
Regards,
--
Dimitri Fontaine 06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
pg_dump | pg_restore
pg_export | psqlWhile I agree that this idea - when implemented - would be nicer in
practically every way, I'm not sure I want to volunteer to do all the
necessary work.What I think needs to happen now is a commiter's buy in that we want to
get there at some point and that your current patch is not painting us
into any corner now. So that we can accept it and have a documented path
forward.
Just stumbled accross this message while reading some older threads
about the current topic:
http://archives.postgresql.org/pgsql-hackers/2010-12/msg02496.php
Where Robert Treat said:
I've both enjoyed reading this thread and seeing this wheel reinvented
yet again, and wholeheartedly +1 the idea of building this directly
into pg_dump. (The only thing better would be to make everything thing
sql callable, but that's a problem for another day).
I know Andrew has been working on his "Retail DDL" project which is
basically a bunch of server-side functions that spits out SQL object
definitions. Andrew, were you able to make progress on that project?
On the other hand, pg_dump -Fs still is something I would like to have
as a complement to Andrew's facility.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 11/19/2012 09:07 AM, Dimitri Fontaine wrote:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
pg_dump | pg_restore
pg_export | psqlWhile I agree that this idea - when implemented - would be nicer in
practically every way, I'm not sure I want to volunteer to do all the
necessary work.What I think needs to happen now is a commiter's buy in that we want to
get there at some point and that your current patch is not painting us
into any corner now. So that we can accept it and have a documented path
forward.Just stumbled accross this message while reading some older threads
about the current topic:http://archives.postgresql.org/pgsql-hackers/2010-12/msg02496.php
Where Robert Treat said:
I've both enjoyed reading this thread and seeing this wheel reinvented
yet again, and wholeheartedly +1 the idea of building this directly
into pg_dump. (The only thing better would be to make everything thing
sql callable, but that's a problem for another day).I know Andrew has been working on his "Retail DDL" project which is
basically a bunch of server-side functions that spits out SQL object
definitions. Andrew, were you able to make progress on that project?On the other hand, pg_dump -Fs still is something I would like to have
as a complement to Andrew's facility.
No, sorry, it's on hold - I'm finding trouble finding time to work on it.
cheers
andrew
Marko Tiikkaja wrote:
On 16/11/2012 15:52, Dimitri Fontaine wrote:
What happens if you have a table foo and another table "FoO"?
They would go to the same file. If you think there are technical
issues behind that decision (e.g. the dump would not restore), I
would like to hear an example case.
create table foo (a int, b text);
create type bar as (stuff foo);
create table "FoO" (more bar);
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers