Add switches for DELIMITER and NULL in pg_dump COPY
Folks,
From the earlier discussion, it appears that there is a variety of
opinions on what the COPY delimiter should be in pg_dump. This patch
allows people to set it and the NULL string. Thanks to Gavin Sherry
for help with the pointers :)
I didn't patch pg_dumpall, but it would be trivial if there's a use
case.
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778
Remember to vote!
Attachments:
pg_dump_copy.difftext/plain; charset=us-asciiDownload
Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.81
diff -c -r1.81 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml 1 Nov 2005 21:09:50 -0000 1.81
--- doc/src/sgml/ref/pg_dump.sgml 6 Mar 2006 07:32:05 -0000
***************
*** 163,168 ****
--- 163,208 ----
</varlistentry>
<varlistentry>
+ <term><option>--copy-delimiter=<replaceable class="parameter">delimiter</replaceable></option></term>
+ <listitem>
+ <para>
+ Use <replaceable class="parameter">delimiter</replaceable>
+ instead of the default tab character in <command>COPY</command> statements.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--copy-null=<replaceable class="parameter">string_for_nulls</replaceable></option></term>
+ <listitem>
+ <para>
+ Use <replaceable class="parameter">string_for_nulls</replaceable> instead of the
+ default \N in <command>COPY</command> statements.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--copy-delimiter=<replaceable class="parameter">delimiter</replaceable></option></term>
+ <listitem>
+ <para>
+ Use <replaceable class="parameter">delimiter</replaceable>
+ instead of the default tab character in <command>COPY</command> statements.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--copy-null=<replaceable class="parameter">string_for_nulls</replaceable></option></term>
+ <listitem>
+ <para>
+ Use <replaceable class="parameter">string_for_nulls</replaceable> instead of the
+ default \N in <command>COPY</command> statements.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-d</option></term>
<term><option>--inserts</option></term>
<listitem>
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.433
diff -c -r1.433 pg_dump.c
*** src/bin/pg_dump/pg_dump.c 5 Mar 2006 15:58:50 -0000 1.433
--- src/bin/pg_dump/pg_dump.c 6 Mar 2006 07:32:12 -0000
***************
*** 114,119 ****
--- 114,125 ----
/* flag to turn on/off dollar quoting */
static int disable_dollar_quoting = 0;
+ /* Things used when caller invokes COPY options. */
+ #define ARG_COPY_DELIMITER 2
+ #define ARG_COPY_NULL 3
+ char *copy_delimiter = "\t";
+ char *copy_null;
+
static void help(const char *progname);
static NamespaceInfo *findNamespace(Oid nsoid, Oid objoid);
***************
*** 181,186 ****
--- 187,193 ----
ExecStatusType expected);
+
int
main(int argc, char **argv)
{
***************
*** 211,217 ****
char *outputSuperuser = NULL;
RestoreOptions *ropt;
!
static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
{"blobs", no_argument, NULL, 'b'},
--- 218,224 ----
char *outputSuperuser = NULL;
RestoreOptions *ropt;
!
static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
{"blobs", no_argument, NULL, 'b'},
***************
*** 249,254 ****
--- 256,269 ----
{"disable-dollar-quoting", no_argument, &disable_dollar_quoting, 1},
{"disable-triggers", no_argument, &disable_triggers, 1},
{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
+
+ /*
+ * The following options don't have an equivalent short option
+ * letter, and are not available as -X long-name. Just use
+ * the long form.
+ */
+ {"copy-delimiter", required_argument, NULL, ARG_COPY_DELIMITER},
+ {"copy-null", required_argument, NULL, ARG_COPY_NULL},
{NULL, 0, NULL, 0}
};
***************
*** 418,423 ****
--- 433,460 ----
break;
/* This covers the long options equivalent to -X xxx. */
+ case ARG_COPY_DELIMITER:
+ if ( strlen(optarg) != 1)
+ {
+ fprintf(stderr, _("In %s, copy-delimiter must be exactly one byte long, not %d.\n"),
+ progname, strlen(optarg));
+ exit(1);
+ }
+ if (*optarg == '\r' || *optarg == '\n' ||
+ *optarg == '\\')
+ {
+ fprintf(stderr, _("In %s, copy-delimiter may not be any of \\r, \\n or \\.\n"),
+ progname);
+ exit(1);
+ }
+ copy_delimiter = optarg;
+ break;
+
+ case ARG_COPY_NULL:
+ copy_null = malloc(2*strlen(optarg)+1);
+ PQescapeString(copy_null, optarg, 2*strlen(optarg)+1);
+ break;
+
case 0:
break;
***************
*** 427,432 ****
--- 464,479 ----
}
}
+ if (copy_null == NULL)
+ copy_null = malloc(3);
+ strcpy(copy_null, "\\N");
+
+ if (strstr(copy_null, copy_delimiter))
+ {
+ fprintf(stderr, _("In %s, the NULL AS string cannot contain the COPY delimiter.\n"), progname);
+ exit(1);
+ }
+
if (optind < (argc - 1))
{
fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
***************
*** 702,707 ****
--- 749,756 ----
" use SESSION AUTHORIZATION commands instead of\n"
" OWNER TO commands\n"));
+ printf(_(" --copy-delimiter string to use as column DELIMITER in COPY statements\n"));
+ printf(_(" --copy-null string to use for NULLs in COPY statements\n"));
printf(_("\nConnection options:\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
printf(_(" -p, --port=PORT database server port number\n"));
***************
*** 844,849 ****
--- 893,904 ----
int ret;
char *copybuf;
const char *column_list;
+ char *local_copy_delimiter;
+ char *local_copy_null;
+ local_copy_delimiter = malloc(2*strlen(copy_delimiter)+1);
+ PQescapeString (local_copy_delimiter, copy_delimiter, 2*strlen(copy_delimiter)+1);
+ local_copy_null = malloc(2*strlen(copy_null)+1);
+ PQescapeString (local_copy_null, copy_null, 2*strlen(copy_null)+1);
if (g_verbose)
write_msg(NULL, "dumping contents of table %s\n", classname);
***************
*** 867,886 ****
else
column_list = ""; /* can't select columns in COPY */
! if (oids && hasoids)
! {
! appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout;",
! fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
! classname),
! column_list);
! }
! else
! {
! appendPQExpBuffer(q, "COPY %s %s TO stdout;",
! fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
! classname),
! column_list);
! }
res = PQexec(g_conn, q->data);
check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
PQclear(res);
--- 922,938 ----
else
column_list = ""; /* can't select columns in COPY */
! /*
! * Explicitly set the DELIMITER and NULL strings in the COPY
! * statement.
! */
! appendPQExpBuffer(q, "COPY %s %s TO stdout %sDELIMITER AS '%s' NULL AS '%s';",
! fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
! classname),
! column_list,
! (oids && hasoids) ? "WITH OIDS " : "",
! local_copy_delimiter,
! local_copy_null);
res = PQexec(g_conn, q->data);
check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
PQclear(res);
***************
*** 1117,1122 ****
--- 1169,1180 ----
PQExpBuffer copyBuf = createPQExpBuffer();
DataDumperPtr dumpFn;
char *copyStmt;
+ char *local_copy_delimiter;
+ char *local_copy_null;
+ local_copy_delimiter = malloc(2*strlen(copy_delimiter)+1);
+ PQescapeString (local_copy_delimiter, copy_delimiter, 2*strlen(copy_delimiter)+1);
+ local_copy_null = malloc(2*strlen(copy_null)+1);
+ PQescapeString (local_copy_null, copy_null, 2*strlen(copy_null)+1);
if (!dumpInserts)
{
***************
*** 1125,1133 ****
/* must use 2 steps here 'cause fmtId is nonreentrant */
appendPQExpBuffer(copyBuf, "COPY %s ",
fmtId(tbinfo->dobj.name));
! appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
! fmtCopyColumnList(tbinfo),
! (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "");
copyStmt = copyBuf->data;
}
else
--- 1183,1193 ----
/* must use 2 steps here 'cause fmtId is nonreentrant */
appendPQExpBuffer(copyBuf, "COPY %s ",
fmtId(tbinfo->dobj.name));
! appendPQExpBuffer(copyBuf, "%s FROM stdin %sDELIMITER AS '%s' NULL AS '%s';\n",
! fmtCopyColumnList(tbinfo),
! (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "",
! local_copy_delimiter,
! local_copy_null);
copyStmt = copyBuf->data;
}
else
David Fetter <david@fetter.org> writes:
From the earlier discussion, it appears that there is a variety of
opinions on what the COPY delimiter should be in pg_dump. This patch
allows people to set it and the NULL string.
Did anyone provide a convincing use case for this? It's of zero value
from the perspective of pg_dump itself; the only possible argument is
that it makes it easier for program-foo to parse the output of pg_dump.
But I don't see any programs around to parse arbitrary SQL scripts,
especially not the pretty-PG-specific scripts that pg_dump emits.
I think it much more likely that people needing this sort of thing would
be using something like "psql -c 'copy foo to stdout'", so as to get the
data without any added overhead.
So this seems like mere creeping featurism to me. pg_dump has too many
switches already.
regards, tom lane
On Wed, 2006-03-08 at 07:47 -0800, David Fetter wrote:
From the earlier discussion, it appears that there is a variety of
opinions on what the COPY delimiter should be in pg_dump. This patch
allows people to set it and the NULL string.
I'm still not convinced there is a reasonable use-case for this feature.
I can't recall: did the previous discussion conclude that we actually
want this functionality?
*** src/bin/pg_dump/pg_dump.c 5 Mar 2006 15:58:50 -0000 1.433 --- src/bin/pg_dump/pg_dump.c 6 Mar 2006 07:32:12 -0000 *************** *** 114,119 **** --- 114,125 ---- /* flag to turn on/off dollar quoting */ static int disable_dollar_quoting = 0;+ /* Things used when caller invokes COPY options. */ + #define ARG_COPY_DELIMITER 2 + #define ARG_COPY_NULL 3 + char *copy_delimiter = "\t"; + char *copy_null; +
The variables should be declared static.
static void help(const char *progname); static NamespaceInfo *findNamespace(Oid nsoid, Oid objoid); *************** *** 181,186 **** --- 187,193 ---- ExecStatusType expected);+
int
main(int argc, char **argv)
{
***************
*** 211,217 ****
char *outputSuperuser = NULL;RestoreOptions *ropt; ! static struct option long_options[] = { {"data-only", no_argument, NULL, 'a'}, {"blobs", no_argument, NULL, 'b'}, --- 218,224 ---- char *outputSuperuser = NULL;RestoreOptions *ropt;
!
static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
{"blobs", no_argument, NULL, 'b'},
Please review patches and eliminate content-free hunks like these before
submitting.
*************** *** 427,432 **** --- 464,479 ---- } }+ if (copy_null == NULL) + copy_null = malloc(3); + strcpy(copy_null, "\\N");
You're missing some braces.
+ if (strstr(copy_null, copy_delimiter)) + { + fprintf(stderr, _("In %s, the NULL AS string cannot contain the COPY delimiter.\n"), progname); + exit(1); + }
I'm not sure as to whether you should be using write_msg() or fprintf()
here, but we should probably pick one and be consistent. Also ISTM we
should be to refactor the code to use exit_nicely() anyway, provided
that g_conn is initialized to NULL before we have connected to the DB.
*************** *** 702,707 **** --- 749,756 ---- " use SESSION AUTHORIZATION commands instead of\n" " OWNER TO commands \n"));+ printf(_(" --copy-delimiter string to use as column
DELIMITER in COPY statements\n"));
Capitalizing "DELIMITER" here is not good style, IMHO: it is just a
normal word.
*** 844,849 **** --- 893,904 ---- int ret; char *copybuf; const char *column_list; + char *local_copy_delimiter; + char *local_copy_null; + local_copy_delimiter = malloc(2*strlen(copy_delimiter)+1); + PQescapeString (local_copy_delimiter, copy_delimiter, 2*strlen(copy_delimiter)+1); + local_copy_null = malloc(2*strlen(copy_null)+1); + PQescapeString (local_copy_null, copy_null, 2*strlen(copy_null)+1);
Spacing: spaces around operands to mathematical operators, no spaces
before the parameter list to a function call.
You should also fix this compiler warning:
[...]/pg_dump.c:440: warning: format '%d' expects type 'int', but
argument 4 has type 'size_t'
-Neil
On Wed, Mar 08, 2006 at 11:03:00AM -0500, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
From the earlier discussion, it appears that there is a variety of
opinions on what the COPY delimiter should be in pg_dump. This
patch allows people to set it and the NULL string.Did anyone provide a convincing use case for this?
I've had one so far, and it was enough to cause me to make a special
patched version of pg_dump. To get some idea of how drastic that was,
consider that I think it's generally bad practice to compile from
source because it can take you too far off the "generally supported
software" map. The case I had was making a database with a schema and
initial data whose dump output gets checked into a source code
management system. Those initial data sets, which can change--for
example when the corresponding ISO codes do--may be in many different
tables, so the easiest way to do this is to make the dump file as easy
as possible to edit.
It's of zero value from the perspective of pg_dump itself; the only
possible argument is that it makes it easier for program-foo to
parse the output of pg_dump. But I don't see any programs around to
parse arbitrary SQL scripts, especially not the pretty-PG-specific
scripts that pg_dump emits.
It's less about program-foo parsing than about multi-table data
management, as above. However, I'm sure that there are people who
will find other uses for it.
I think it much more likely that people needing this sort of thing would
be using something like "psql -c 'copy foo to stdout'", so as to get the
data without any added overhead.
The one-table-at-a-time approach is quite error-prone for large
numbers of tables and/or large data sets.
So this seems like mere creeping featurism to me. pg_dump has too
many switches already.
I've been careful to see to it that only people who use the switches
are affected by it. I am also volunteering to do ongoing maintenance
of this feature. :)
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778
Remember to vote!
On Wed, Mar 08, 2006 at 11:10:04AM -0500, Neil Conway wrote:
On Wed, 2006-03-08 at 07:47 -0800, David Fetter wrote:
From the earlier discussion, it appears that there is a variety of
opinions on what the COPY delimiter should be in pg_dump. This patch
allows people to set it and the NULL string.I'm still not convinced there is a reasonable use-case for this feature.
I can't recall: did the previous discussion conclude that we actually
want this functionality?
The previous discussion showed that there is a wide diversity of
opinions on what The Right Delimiter and The Right NULL String(TM)
are.
Thanks for the tips. :) I'll make a revised patch this evening, time
permitting.
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778
Remember to vote!
David Fetter <david@fetter.org> writes:
On Wed, Mar 08, 2006 at 11:03:00AM -0500, Tom Lane wrote:
Did anyone provide a convincing use case for this?
I've had one so far, and it was enough to cause me to make a special
patched version of pg_dump. To get some idea of how drastic that was,
consider that I think it's generally bad practice to compile from
source because it can take you too far off the "generally supported
software" map. The case I had was making a database with a schema and
initial data whose dump output gets checked into a source code
management system.
So? Don't tell me your SCMS can't handle tabs.
regards, tom lane
On Wed, Mar 08, 2006 at 11:26:00AM -0500, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
On Wed, Mar 08, 2006 at 11:03:00AM -0500, Tom Lane wrote:
Did anyone provide a convincing use case for this?
I've had one so far, and it was enough to cause me to make a
special patched version of pg_dump. To get some idea of how
drastic that was, consider that I think it's generally bad
practice to compile from source because it can take you too far
off the "generally supported software" map. The case I had was
making a database with a schema and initial data whose dump output
gets checked into a source code management system.So? Don't tell me your SCMS can't handle tabs.
Not everybody's editor/mailer/whatever does this right, and it makes
things fragile. Another way to do this is to change the delimter to a
printable character like '|', but that raises hackles, too.
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778
Remember to vote!
David Fetter <david@fetter.org> writes:
Not everybody's editor/mailer/whatever does this right, and it makes
things fragile. Another way to do this is to change the delimter to a
printable character like '|', but that raises hackles, too.
Frankly if you're passing you data through an editor/mailer/whatever you don't
trust then your setup is already fragile. At least if you're using tabs then
you find out about these problems. Tiptoeing around the untrustworthy process
just means that it'll fail randomly (and unpredictably) when other characters
appear in the data that the software doesn't handle.
There are certainly cases where you'll need to do this to interface with other
(amateurish) software. But pg_dump isn't for that at all. Even COPY isn't a
general purpose data formatter. To interface with other software not using a
standard format you're going to have to pass the data through Perl or
something like that anyways.
--
greg
On Wed, 2006-03-08 at 08:20 -0800, David Fetter wrote:
The previous discussion showed that there is a wide diversity of
opinions on what The Right Delimiter and The Right NULL String(TM)
are.
Barring a more convincing justification for why we need this feature,
I'm inclined to side with Tom: pg_dump has enough obscure options as it
is, and I can't imagine very many people needing this functionality.
-Neil
On Wed, Mar 08, 2006 at 04:57:52PM -0500, Neil Conway wrote:
On Wed, 2006-03-08 at 08:20 -0800, David Fetter wrote:
The previous discussion showed that there is a wide diversity of
opinions on what The Right Delimiter and The Right NULL String(TM)
are.Barring a more convincing justification for why we need this feature,
I'm inclined to side with Tom: pg_dump has enough obscure options as it
is, and I can't imagine very many people needing this functionality.
Given all the different requests that come in for pg_dump and copy,
maybe it makes sense for Someone Who Cares to start a pgFoundry project
(or maybe extend the import/export project that's already there).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461