CSV patch applied
I have applied the attached patch that complete TODO item:
o -Allow dump/load of CSV format
This adds new keywords to COPY and \copy:
CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columns
LITERAL - prevent NULL checks for specific columns
If folks have better ideas for naming of those keywords, I am all ears!
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachments:
/pgpatches/copytext/plainDownload
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.55
diff -c -c -r1.55 copy.sgml
*** doc/src/sgml/ref/copy.sgml 13 Dec 2003 23:59:07 -0000 1.55
--- doc/src/sgml/ref/copy.sgml 19 Apr 2004 17:07:42 -0000
***************
*** 26,32 ****
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ]
COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable> [, ...] ) ]
TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
--- 26,35 ----
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
! [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ]
! [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
! [ LITERAL <replaceable class="parameter">column</replaceable> [, ...] ]
COPY <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable> [, ...] ) ]
TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
***************
*** 34,40 ****
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ] ]
</synopsis>
</refsynopsisdiv>
--- 37,46 ----
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] '<replaceable class="parameter">delimiter</replaceable>' ]
! [ NULL [ AS ] '<replaceable class="parameter">null string</replaceable>' ]
! [ CSV [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ]
! [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
! [ FORCE <replaceable class="parameter">column</replaceable> [, ...] ]
</synopsis>
</refsynopsisdiv>
***************
*** 146,152 ****
<listitem>
<para>
The single character that separates columns within each row
! (line) of the file. The default is a tab character.
</para>
</listitem>
</varlistentry>
--- 152,159 ----
<listitem>
<para>
The single character that separates columns within each row
! (line) of the file. The default is a tab character in text mode,
! a comma in <literal>CSV</> mode.
</para>
</listitem>
</varlistentry>
***************
*** 156,175 ****
<listitem>
<para>
The string that represents a null value. The default is
! <literal>\N</literal> (backslash-N). You might prefer an empty
! string, for example.
</para>
<note>
<para>
! On a <command>COPY FROM</command>, any data item that matches
this string will be stored as a null value, so you should make
sure that you use the same string as you used with
<command>COPY TO</command>.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
--- 163,248 ----
<listitem>
<para>
The string that represents a null value. The default is
! <literal>\N</literal> (backslash-N) in text mode, and a empty
! value with no quotes in <literal>CSV</> mode. You might prefer an
! empty string even in text mode for cases where you don't want to
! distinguish nulls from empty strings.
</para>
<note>
<para>
! When using <command>COPY FROM</command>, any data item that matches
this string will be stored as a null value, so you should make
sure that you use the same string as you used with
<command>COPY TO</command>.
</para>
</note>
+
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>CSV</literal></term>
+ <listitem>
+ <para>
+ Enables Comma Separated Variable (<literal>CSV</>) mode. (Also called
+ Comma Separated Value). It sets the default <literal>DELIMITER</> to
+ comma, and <literal>QUOTE</> and <literal>ESCAPE</> values to
+ double-quote.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">quote</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the quotation character in <literal>CSV</> mode.
+ The default is double-quote.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">escape</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the character that should appear before a <literal>QUOTE</>
+ data character value in <literal>CSV</> mode. The default is the
+ <literal>QUOTE</> value (usually double-quote).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>FORCE</></term>
+ <listitem>
+ <para>
+ In <literal>CSV</> <command>COPY TO</> mode, forces quoting
+ to be used for all non-<literal>NULL</> values in each specified
+ column. <literal>NULL</> output is never quoted.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LITERAL</></term>
+ <listitem>
+ <para>
+ In <literal>CSV</> <command>COPY FROM</> mode, for each column specified,
+ do not do a <literal>null string</> comparison; instead load the value
+ literally. <literal>QUOTE</> and <literal>ESCAPE</> processing are still
+ performed.
+ </para>
+ <para>
+ If the <literal>null string</> is <literal>''</> (the default
+ in <literal>CSV</> mode), a missing input value (<literal>delimiter,
+ delimiter</>), will load as a zero-length string. <literal>Delimiter, quote,
+ quote, delimiter</> is always treated as a zero-length string on input.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
***************
*** 234,239 ****
--- 307,323 ----
</para>
<para>
+ <command>COPY</command> input and output is affected by
+ <varname>DateStyle </varname>. For portability with other
+ <productname>PostgreSQL</productname> installations which might use
+ non-default <varname>DateStyle</varname> settings,
+ <varname>DateStyle</varname> should be set to <literal>ISO</> before
+ using <command>COPY</>. In <literal>CSV</> mode, use <literal>ISO</>
+ or a <varname>DateStyle</varname> setting appropriate for the
+ external application.
+ </para>
+
+ <para>
<command>COPY</command> stops operation at the first error. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
***************
*** 253,259 ****
<para>
When <command>COPY</command> is used without the <literal>BINARY</literal> option,
! the data read or written is a text file with one line per table row.
Columns in a row are separated by the delimiter character.
The column values themselves are strings generated by the
output function, or acceptable to the input function, of each
--- 337,344 ----
<para>
When <command>COPY</command> is used without the <literal>BINARY</literal> option,
! the data read or written is a text file with one line per table row,
! unless <literal>CSV</> mode is used.
Columns in a row are separated by the delimiter character.
The column values themselves are strings generated by the
output function, or acceptable to the input function, of each
***************
*** 377,382 ****
--- 462,524 ----
meant as data, <command>COPY FROM</command> will complain if the line
endings in the input are not all alike.
</para>
+ </refsect2>
+
+ <refsect2>
+ <title>CSV Format</title>
+
+ <para>
+ This format is used for importing and exporting the Comma
+ Separated Variable (<literal>CSV</>) file format used by many other
+ programs, such as spreadsheets. Instead of the escaping used by
+ <productname>PostgreSQL</productname>'s standard text mode, it
+ produces and recognises the common CSV escaping mechanism.
+ </para>
+
+ <para>
+ The values in each record are separated by the <literal>DELIMITER</>
+ character. If the value contains the delimiter character, the
+ <literal>QUOTE</> character, the <literal>NULL</> string, a carriage
+ return, or line feed character, then the whole value is prefixed and
+ suffixed by the <literal>QUOTE</> character, and any occurrence
+ within the value of a <literal>QUOTE</> character or the
+ <literal>ESCAPE</> character is preceded by the escape character.
+ You can also use <literal>FORCE</> to force quotes when outputting
+ non-<literal>NULL</> values in specific columns.
+ </para>
+
+ <para>
+ In general, the <literal>CSV</> format has no way to distinguish a
+ <literal>NULL</> from an empty string.
+ <productname>PostgreSQL</productname>'s COPY handles this by
+ quoting. A <literal>NULL</> is output as the <literal>NULL</> string
+ and is not quoted, while a data value matching the <literal>NULL</> string
+ is quoted. Therefore, using the default settings, a <literal>NULL</> is
+ written as an unquoted empty string, while an empty string is
+ written with double quotes (<literal>""</>). Reading values follows
+ similar rules. You can use <literal>LITERAL</> to prevent <literal>NULL</>
+ input comparisons for specific columns.
+ </para>
+
+ <note>
+ <para>
+ CSV mode will both recognize and produce CSV files with quoted
+ values containing embedded carriage returns and line feeds. Thus
+ the files are not strictly one line per table row like text-mode
+ files.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ Many programs produce strange and occasionally perverse CSV files,
+ so the file format is more a convention than a standard. Thus you
+ might encounter some files that cannot be imported using this
+ mechanism, and <command>COPY</> might produce files that other
+ programs can not process.
+ </para>
+ </note>
+
</refsect2>
<refsect2>
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.110
diff -c -c -r1.110 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml 12 Apr 2004 15:58:52 -0000 1.110
--- doc/src/sgml/ref/psql-ref.sgml 19 Apr 2004 17:07:44 -0000
***************
*** 711,716 ****
--- 711,720 ----
[ <literal>oids</literal> ]
[ <literal>delimiter [as] </literal> '<replaceable class="parameter">character</replaceable>' ]
[ <literal>null [as] </literal> '<replaceable class="parameter">string</replaceable>' ]</literal>
+ [ <literal>csv [ quote [as] </literal> '<replaceable class="parameter">character</replaceable>' ]
+ [ <literal>escape [as] </literal> '<replaceable class="parameter">character</replaceable>' ]
+ [ <literal>force</> <replaceable class="parameter">column_list</replaceable> ]
+ [ <literal>literal</> <replaceable class="parameter">column_list</replaceable> ] ]
</term>
<listitem>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.220
diff -c -c -r1.220 copy.c
*** src/backend/commands/copy.c 15 Apr 2004 22:36:03 -0000 1.220
--- src/backend/commands/copy.c 19 Apr 2004 17:07:46 -0000
***************
*** 70,76 ****
typedef enum CopyReadResult
{
NORMAL_ATTR,
! END_OF_LINE
} CopyReadResult;
/*
--- 70,77 ----
typedef enum CopyReadResult
{
NORMAL_ATTR,
! END_OF_LINE,
! UNTERMINATED_FIELD
} CopyReadResult;
/*
***************
*** 130,144 ****
/* non-export function prototypes */
static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print);
static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print);
static bool CopyReadLine(void);
static char *CopyReadAttribute(const char *delim, const char *null_print,
CopyReadResult *result, bool *isnull);
static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
Oid typelem, bool *isnull);
static void CopyAttributeOut(char *string, char *delim);
static List *CopyGetAttnums(Relation rel, List *attnamelist);
static void limit_printout_length(StringInfo buf);
--- 131,152 ----
/* non-export function prototypes */
static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
! List *force_atts);
static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
! List *literal_atts);
static bool CopyReadLine(void);
static char *CopyReadAttribute(const char *delim, const char *null_print,
CopyReadResult *result, bool *isnull);
+ static char *CopyReadAttributeCSV(const char *delim, const char *null_print,
+ char *quote, char *escape,
+ CopyReadResult *result, bool *isnull);
static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
Oid typelem, bool *isnull);
static void CopyAttributeOut(char *string, char *delim);
+ static void CopyAttributeOutCSV(char *string, char *delim, char *quote,
+ char *escape, bool force_quote);
static List *CopyGetAttnums(Relation rel, List *attnamelist);
static void limit_printout_length(StringInfo buf);
***************
*** 682,689 ****
--- 690,704 ----
List *attnumlist;
bool binary = false;
bool oids = false;
+ bool csv_mode = false;
char *delim = NULL;
+ char *quote = NULL;
+ char *escape = NULL;
char *null_print = NULL;
+ List *force = NIL;
+ List *literal = NIL;
+ List *force_atts = NIL;
+ List *literal_atts = NIL;
Relation rel;
AclMode required_access = (is_from ? ACL_INSERT : ACL_SELECT);
AclResult aclresult;
***************
*** 725,730 ****
--- 740,785 ----
errmsg("conflicting or redundant options")));
null_print = strVal(defel->arg);
}
+ else if (strcmp(defel->defname, "csv") == 0)
+ {
+ if (csv_mode)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ csv_mode = intVal(defel->arg);
+ }
+ else if (strcmp(defel->defname, "quote") == 0)
+ {
+ if (quote)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ quote = strVal(defel->arg);
+ }
+ else if (strcmp(defel->defname, "escape") == 0)
+ {
+ if (escape)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ escape = strVal(defel->arg);
+ }
+ else if (strcmp(defel->defname, "force") == 0)
+ {
+ if (force)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ force = (List *)defel->arg;
+ }
+ else if (strcmp(defel->defname, "literal") == 0)
+ {
+ if (literal)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ literal = (List *)defel->arg;
+ }
else
elog(ERROR, "option \"%s\" not recognized",
defel->defname);
***************
*** 735,740 ****
--- 790,800 ----
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DELIMITER in BINARY mode")));
+ if (binary && csv_mode)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify CSV in BINARY mode")));
+
if (binary && null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
***************
*** 742,751 ****
/* Set defaults */
if (!delim)
! delim = "\t";
!
if (!null_print)
! null_print = "\\N";
/*
* Open and lock the relation, using the appropriate lock type.
--- 802,893 ----
/* Set defaults */
if (!delim)
! delim = csv_mode ? "," : "\t";
!
if (!null_print)
! null_print = csv_mode ? "" : "\\N";
!
! if (csv_mode)
! {
! if (!quote)
! quote = "\"";
! if (!escape)
! escape = quote;
! }
!
! /*
! * Only single-character delimiter strings are supported.
! */
! if (strlen(delim) != 1)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY delimiter must be a single character")));
!
! /*
! * Check quote
! */
! if (!csv_mode && quote != NULL)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY quote available only in CSV mode")));
!
! if (csv_mode && strlen(quote) != 1)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY quote must be a single character")));
!
! /*
! * Check escape
! */
! if (!csv_mode && escape != NULL)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY escape available only in CSV mode")));
!
! if (csv_mode && strlen(escape) != 1)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY escape must be a single character")));
!
! /*
! * Check force
! */
! if (!csv_mode && force != NIL)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY force available only in CSV mode")));
! if (force != NIL && is_from)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY force only available using COPY TO")));
!
! /*
! * Check literal
! */
! if (!csv_mode && literal != NIL)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY literal available only in CSV mode")));
! if (literal != NIL && !is_from)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY literal only available using COPY FROM")));
!
! /*
! * Don't allow the delimiter to appear in the null string.
! */
! if (strchr(null_print, delim[0]) != NULL)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("COPY delimiter must not appear in the NULL specification")));
!
! /*
! * Don't allow the csv quote char to appear in the null string.
! */
! if (csv_mode && strchr(null_print, quote[0]) != NULL)
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("CSV quote character must not appear in the NULL specification")));
/*
* Open and lock the relation, using the appropriate lock type.
***************
*** 772,793 ****
"psql's \\copy command also works for anyone.")));
/*
- * Presently, only single-character delimiter strings are supported.
- */
- if (strlen(delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single character")));
-
- /*
- * Don't allow the delimiter to appear in the null string.
- */
- if (strchr(null_print, delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must not appear in the NULL specification")));
-
- /*
* Don't allow COPY w/ OIDs to or from a table without them
*/
if (oids && !rel->rd_rel->relhasoids)
--- 914,919 ----
***************
*** 802,807 ****
--- 928,979 ----
attnumlist = CopyGetAttnums(rel, attnamelist);
/*
+ * Check that FORCE references valid COPY columns
+ */
+ if (force)
+ {
+ TupleDesc tupDesc = RelationGetDescr(rel);
+ Form_pg_attribute *attr = tupDesc->attrs;
+ List *cur;
+
+ force_atts = CopyGetAttnums(rel, force);
+
+ foreach(cur, force_atts)
+ {
+ int attnum = lfirsti(cur);
+
+ if (!intMember(attnum, attnumlist))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("FORCE column \"%s\" not referenced by COPY",
+ NameStr(attr[attnum - 1]->attname))));
+ }
+ }
+
+ /*
+ * Check that LITERAL references valid COPY columns
+ */
+ if (literal)
+ {
+ List *cur;
+ TupleDesc tupDesc = RelationGetDescr(rel);
+ Form_pg_attribute *attr = tupDesc->attrs;
+
+ literal_atts = CopyGetAttnums(rel, literal);
+
+ foreach(cur, literal_atts)
+ {
+ int attnum = lfirsti(cur);
+
+ if (!intMember(attnum, attnumlist))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("LITERAL column \"%s\" not referenced by COPY",
+ NameStr(attr[attnum - 1]->attname))));
+ }
+ }
+
+ /*
* Set up variables to avoid per-attribute overhead.
*/
initStringInfo(&attribute_buf);
***************
*** 864,870 ****
errmsg("\"%s\" is a directory", filename)));
}
}
! CopyFrom(rel, attnumlist, binary, oids, delim, null_print);
}
else
{ /* copy from database to file */
--- 1036,1043 ----
errmsg("\"%s\" is a directory", filename)));
}
}
! CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
! quote, escape, literal_atts);
}
else
{ /* copy from database to file */
***************
*** 926,932 ****
errmsg("\"%s\" is a directory", filename)));
}
}
! CopyTo(rel, attnumlist, binary, oids, delim, null_print);
}
if (!pipe)
--- 1099,1106 ----
errmsg("\"%s\" is a directory", filename)));
}
}
! CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
! quote, escape, force_atts);
}
if (!pipe)
***************
*** 958,964 ****
*/
static void
CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print)
{
HeapTuple tuple;
TupleDesc tupDesc;
--- 1132,1139 ----
*/
static void
CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print, bool csv_mode, char *quote,
! char *escape, List *force_atts)
{
HeapTuple tuple;
TupleDesc tupDesc;
***************
*** 967,972 ****
--- 1142,1148 ----
int attr_count;
Form_pg_attribute *attr;
FmgrInfo *out_functions;
+ bool *force_quote;
Oid *elements;
bool *isvarlena;
char *string;
***************
*** 988,998 ****
out_functions = (FmgrInfo *) palloc((num_phys_attrs + 1) * sizeof(FmgrInfo));
elements = (Oid *) palloc((num_phys_attrs + 1) * sizeof(Oid));
isvarlena = (bool *) palloc((num_phys_attrs + 1) * sizeof(bool));
foreach(cur, attnumlist)
{
int attnum = lfirsti(cur);
Oid out_func_oid;
!
if (binary)
getTypeBinaryOutputInfo(attr[attnum - 1]->atttypid,
&out_func_oid, &elements[attnum - 1],
--- 1164,1175 ----
out_functions = (FmgrInfo *) palloc((num_phys_attrs + 1) * sizeof(FmgrInfo));
elements = (Oid *) palloc((num_phys_attrs + 1) * sizeof(Oid));
isvarlena = (bool *) palloc((num_phys_attrs + 1) * sizeof(bool));
+ force_quote = (bool *) palloc((num_phys_attrs + 1) * sizeof(bool));
foreach(cur, attnumlist)
{
int attnum = lfirsti(cur);
Oid out_func_oid;
!
if (binary)
getTypeBinaryOutputInfo(attr[attnum - 1]->atttypid,
&out_func_oid, &elements[attnum - 1],
***************
*** 1002,1007 ****
--- 1179,1189 ----
&out_func_oid, &elements[attnum - 1],
&isvarlena[attnum - 1]);
fmgr_info(out_func_oid, &out_functions[attnum - 1]);
+
+ if (intMember(attnum, force_atts))
+ force_quote[attnum - 1] = true;
+ else
+ force_quote[attnum - 1] = false;
}
/*
***************
*** 1051,1057 ****
while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
{
bool need_delim = false;
-
CHECK_FOR_INTERRUPTS();
MemoryContextReset(mycontext);
--- 1233,1238 ----
***************
*** 1113,1119 ****
value,
ObjectIdGetDatum(elements[attnum - 1]),
Int32GetDatum(attr[attnum - 1]->atttypmod)));
! CopyAttributeOut(string, delim);
}
else
{
--- 1294,1308 ----
value,
ObjectIdGetDatum(elements[attnum - 1]),
Int32GetDatum(attr[attnum - 1]->atttypmod)));
! if (csv_mode)
! {
! CopyAttributeOutCSV(string, delim, quote, escape,
! (strcmp(string, null_print) == 0 ||
! force_quote[attnum - 1]));
! }
! else
! CopyAttributeOut(string, delim);
!
}
else
{
***************
*** 1148,1153 ****
--- 1337,1343 ----
pfree(out_functions);
pfree(elements);
pfree(isvarlena);
+ pfree(force_quote);
}
***************
*** 1243,1249 ****
*/
static void
CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print)
{
HeapTuple tuple;
TupleDesc tupDesc;
--- 1433,1440 ----
*/
static void
CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
! char *delim, char *null_print, bool csv_mode, char *quote,
! char *escape, List *literal_atts)
{
HeapTuple tuple;
TupleDesc tupDesc;
***************
*** 1256,1264 ****
Oid *elements;
Oid oid_in_element;
ExprState **constraintexprs;
bool hasConstraints = false;
- int i;
int attnum;
List *cur;
Oid in_func_oid;
Datum *values;
--- 1447,1456 ----
Oid *elements;
Oid oid_in_element;
ExprState **constraintexprs;
+ bool *literal_nullstr;
bool hasConstraints = false;
int attnum;
+ int i;
List *cur;
Oid in_func_oid;
Datum *values;
***************
*** 1317,1322 ****
--- 1509,1515 ----
defmap = (int *) palloc((num_phys_attrs + 1) * sizeof(int));
defexprs = (ExprState **) palloc((num_phys_attrs + 1) * sizeof(ExprState *));
constraintexprs = (ExprState **) palloc0((num_phys_attrs + 1) * sizeof(ExprState *));
+ literal_nullstr = (bool *) palloc((num_phys_attrs + 1) * sizeof(bool));
for (attnum = 1; attnum <= num_phys_attrs; attnum++)
{
***************
*** 1333,1338 ****
--- 1526,1536 ----
&in_func_oid, &elements[attnum - 1]);
fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+ if (intMember(attnum, literal_atts))
+ literal_nullstr[attnum - 1] = true;
+ else
+ literal_nullstr[attnum - 1] = false;
+
/* Get default info if needed */
if (!intMember(attnum, attnumlist))
{
***************
*** 1389,1397 ****
ExecBSInsertTriggers(estate, resultRelInfo);
if (!binary)
- {
file_has_oids = oids; /* must rely on user to tell us this... */
- }
else
{
/* Read and verify binary header */
--- 1587,1593 ----
***************
*** 1500,1505 ****
--- 1696,1702 ----
if (file_has_oids)
{
+ /* can't be in CSV mode here */
string = CopyReadAttribute(delim, null_print,
&result, &isnull);
***************
*** 1538,1551 ****
errmsg("missing data for column \"%s\"",
NameStr(attr[m]->attname))));
! string = CopyReadAttribute(delim, null_print,
! &result, &isnull);
!
! if (isnull)
{
! /* we read an SQL NULL, no need to do anything */
}
else
{
copy_attname = NameStr(attr[m]->attname);
values[m] = FunctionCall3(&in_functions[m],
--- 1735,1761 ----
errmsg("missing data for column \"%s\"",
NameStr(attr[m]->attname))));
! if (csv_mode)
{
! string = CopyReadAttributeCSV(delim, null_print, quote,
! escape, &result, &isnull);
! if (result == UNTERMINATED_FIELD)
! ereport(ERROR,
! (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
! errmsg("unterminated CSV quoted field")));
}
else
+ string = CopyReadAttribute(delim, null_print,
+ &result, &isnull);
+
+ if (csv_mode && isnull && literal_nullstr[m])
+ {
+ string = null_print; /* set to NULL string */
+ isnull = false;
+ }
+
+ /* we read an SQL NULL, no need to do anything */
+ if (!isnull)
{
copy_attname = NameStr(attr[m]->attname);
values[m] = FunctionCall3(&in_functions[m],
***************
*** 1732,1742 ****
pfree(values);
pfree(nulls);
! if (!binary)
! {
! pfree(in_functions);
! pfree(elements);
! }
ExecDropTupleTable(tupleTable, true);
--- 1942,1953 ----
pfree(values);
pfree(nulls);
! pfree(in_functions);
! pfree(elements);
! pfree(defmap);
! pfree(defexprs);
! pfree(constraintexprs);
! pfree(literal_nullstr);
ExecDropTupleTable(tupleTable, true);
***************
*** 2070,2075 ****
--- 2281,2432 ----
return attribute_buf.data;
}
+
+ /*
+ * Read the value of a single attribute in CSV mode,
+ * performing de-escaping as needed. Escaping does not follow the normal
+ * PostgreSQL text mode, but instead "standard" (i.e. common) CSV usage.
+ *
+ * Quoted fields can span lines, in which case the line end is embedded
+ * in the returned string.
+ *
+ * null_print is the null marker string. Note that this is compared to
+ * the pre-de-escaped input string (thus if it is quoted it is not a NULL).
+ *
+ * *result is set to indicate what terminated the read:
+ * NORMAL_ATTR: column delimiter
+ * END_OF_LINE: end of line
+ * UNTERMINATED_FIELD no quote detected at end of a quoted field
+ *
+ * In any case, the string read up to the terminator (or end of file)
+ * is returned.
+ *
+ * *isnull is set true or false depending on whether the input matched
+ * the null marker. Note that the caller cannot check this since the
+ * returned string will be the post-de-escaping equivalent, which may
+ * look the same as some valid data string.
+ *----------
+ */
+
+ static char *
+ CopyReadAttributeCSV(const char *delim, const char *null_print, char *quote,
+ char *escape, CopyReadResult *result, bool *isnull)
+ {
+ char delimc = delim[0];
+ char quotec = quote[0];
+ char escapec = escape[0];
+ char c;
+ int start_cursor = line_buf.cursor;
+ int end_cursor = start_cursor;
+ int input_len;
+ bool in_quote = false;
+ bool saw_quote = false;
+
+ /* reset attribute_buf to empty */
+ attribute_buf.len = 0;
+ attribute_buf.data[0] = '\0';
+
+ /* set default status */
+ *result = END_OF_LINE;
+
+ for (;;)
+ {
+ /* handle multiline quoted fields */
+ if (in_quote && line_buf.cursor >= line_buf.len)
+ {
+ bool done;
+
+ switch(eol_type)
+ {
+ case EOL_NL:
+ appendStringInfoString(&attribute_buf,"\n");
+ break;
+ case EOL_CR:
+ appendStringInfoString(&attribute_buf,"\r");
+ break;
+ case EOL_CRNL:
+ appendStringInfoString(&attribute_buf,"\r\n");
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't happen - just keep going */
+ break;
+ }
+
+ copy_lineno++;
+ done = CopyReadLine();
+ if (done && line_buf.len == 0)
+ break;
+ start_cursor = line_buf.cursor;
+ }
+
+ end_cursor = line_buf.cursor;
+ if (line_buf.cursor >= line_buf.len)
+ break;
+ c = line_buf.data[line_buf.cursor++];
+ /*
+ * unquoted field delimiter
+ */
+ if (!in_quote && c == delimc)
+ {
+ *result = NORMAL_ATTR;
+ break;
+ }
+ /*
+ * start of quoted field (or part of field)
+ */
+ if (!in_quote && c == quotec)
+ {
+ saw_quote = true;
+ in_quote = true;
+ continue;
+ }
+ /*
+ * escape within a quoted field
+ */
+ if (in_quote && c == escapec)
+ {
+ /*
+ * peek at the next char if available, and escape it if it
+ * is an escape char or a quote char
+ */
+ if (line_buf.cursor <= line_buf.len)
+ {
+ char nextc = line_buf.data[line_buf.cursor];
+ if (nextc == escapec || nextc == quotec)
+ {
+ appendStringInfoCharMacro(&attribute_buf, nextc);
+ line_buf.cursor++;
+ continue;
+ }
+ }
+ }
+ /*
+ * end of quoted field.
+ * Must do this test after testing for escape in case quote char
+ * and escape char are the same (which is the common case).
+ */
+ if (in_quote && c == quotec)
+ {
+ in_quote = false;
+ continue;
+ }
+ appendStringInfoCharMacro(&attribute_buf, c);
+ }
+
+ if (in_quote)
+ *result = UNTERMINATED_FIELD;
+
+ /* check whether raw input matched null marker */
+ input_len = end_cursor - start_cursor;
+ if (!saw_quote && input_len == strlen(null_print) &&
+ strncmp(&line_buf.data[start_cursor], null_print, input_len) == 0)
+ *isnull = true;
+ else
+ *isnull = false;
+
+ return attribute_buf.data;
+ }
+
/*
* Read a binary attribute
*/
***************
*** 2193,2198 ****
--- 2550,2622 ----
break;
}
}
+ }
+
+ /*
+ * Send CSV representation of one attribute, with conversion and
+ * CSV type escaping
+ */
+ static void
+ CopyAttributeOutCSV(char *server_string, char *delim, char *quote,
+ char *escape, bool force_quote)
+ {
+ char *string;
+ char c;
+ char delimc = delim[0];
+ char quotec = quote[0];
+ char escapec = escape[0];
+ bool need_quote = force_quote;
+ char *test_string;
+ bool same_encoding;
+ int mblen;
+ int i;
+
+ same_encoding = (server_encoding == client_encoding);
+ if (!same_encoding)
+ string = (char *) pg_server_to_client((unsigned char *) server_string,
+ strlen(server_string));
+ else
+ string = server_string;
+
+ /* have to run through the string twice,
+ * first time to see if it needs quoting, second to actually send it
+ */
+
+ for(test_string = string;
+ !need_quote && (c = *test_string) != '\0';
+ test_string += mblen)
+ {
+ if (c == delimc || c == quotec || c == '\n' || c == '\r')
+ need_quote = true;
+ if (!same_encoding)
+ mblen = pg_encoding_mblen(client_encoding, test_string);
+ else
+ mblen = 1;
+ }
+
+ if (need_quote)
+ CopySendChar(quotec);
+
+ for (; (c = *string) != '\0'; string += mblen)
+ {
+ if (c == quotec || c == escapec)
+ CopySendChar(escapec);
+
+ CopySendChar(c);
+
+ if (!same_encoding)
+ {
+ /* send additional bytes of the char, if any */
+ mblen = pg_encoding_mblen(client_encoding, string);
+ for (i = 1; i < mblen; i++)
+ CopySendChar(string[i]);
+ }
+ else
+ mblen = 1;
+ }
+
+ if (need_quote)
+ CopySendChar(quotec);
}
/*
Index: src/backend/parser/gram.y
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/gram.y,v
retrieving revision 2.450
diff -c -c -r2.450 gram.y
*** src/backend/parser/gram.y 5 Apr 2004 03:07:26 -0000 2.450
--- src/backend/parser/gram.y 19 Apr 2004 17:07:50 -0000
***************
*** 343,349 ****
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
! CREATEUSER CROSS CURRENT_DATE CURRENT_TIME
CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 343,349 ----
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
COMMITTED CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
! CREATEUSER CROSS CSV CURRENT_DATE CURRENT_TIME
CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
***************
*** 370,376 ****
KEY
LANCOMPILER LANGUAGE LARGE_P LAST_P LEADING LEFT LEVEL LIKE LIMIT
! LISTEN LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATION
LOCK_P
MATCH MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
--- 370,376 ----
KEY
LANCOMPILER LANGUAGE LARGE_P LAST_P LEADING LEFT LEVEL LIKE LIMIT
! LISTEN LITERAL LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATION
LOCK_P
MATCH MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
***************
*** 386,391 ****
--- 386,393 ----
PRECISION PRESERVE PREPARE PRIMARY
PRIOR PRIVILEGES PROCEDURAL PROCEDURE
+ QUOTE
+
READ REAL RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
RULE
***************
*** 1360,1365 ****
--- 1362,1387 ----
{
$$ = makeDefElem("null", (Node *)makeString($3));
}
+ | CSV
+ {
+ $$ = makeDefElem("csv", (Node *)makeInteger(TRUE));
+ }
+ | QUOTE opt_as Sconst
+ {
+ $$ = makeDefElem("quote", (Node *)makeString($3));
+ }
+ | ESCAPE opt_as Sconst
+ {
+ $$ = makeDefElem("escape", (Node *)makeString($3));
+ }
+ | FORCE columnList
+ {
+ $$ = makeDefElem("force", (Node *)$2);
+ }
+ | LITERAL columnList
+ {
+ $$ = makeDefElem("literal", (Node *)$2);
+ }
;
/* The following exist for backward compatibility */
***************
*** 7420,7425 ****
--- 7442,7448 ----
| COPY
| CREATEDB
| CREATEUSER
+ | CSV
| CURSOR
| CYCLE
| DATABASE
***************
*** 7473,7478 ****
--- 7496,7502 ----
| LAST_P
| LEVEL
| LISTEN
+ | LITERAL
| LOAD
| LOCAL
| LOCATION
***************
*** 7507,7512 ****
--- 7531,7537 ----
| PRIVILEGES
| PROCEDURAL
| PROCEDURE
+ | QUOTE
| READ
| RECHECK
| REINDEX
Index: src/backend/parser/keywords.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/parser/keywords.c,v
retrieving revision 1.147
diff -c -c -r1.147 keywords.c
*** src/backend/parser/keywords.c 11 Mar 2004 01:47:40 -0000 1.147
--- src/backend/parser/keywords.c 19 Apr 2004 17:07:50 -0000
***************
*** 90,95 ****
--- 90,96 ----
{"createdb", CREATEDB},
{"createuser", CREATEUSER},
{"cross", CROSS},
+ {"csv", CSV},
{"current_date", CURRENT_DATE},
{"current_time", CURRENT_TIME},
{"current_timestamp", CURRENT_TIMESTAMP},
***************
*** 186,191 ****
--- 187,193 ----
{"like", LIKE},
{"limit", LIMIT},
{"listen", LISTEN},
+ {"literal", LITERAL},
{"load", LOAD},
{"local", LOCAL},
{"localtime", LOCALTIME},
***************
*** 248,253 ****
--- 250,256 ----
{"privileges", PRIVILEGES},
{"procedural", PROCEDURAL},
{"procedure", PROCEDURE},
+ {"quote", QUOTE},
{"read", READ},
{"real", REAL},
{"recheck", RECHECK},
Index: src/backend/tcop/fastpath.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/tcop/fastpath.c,v
retrieving revision 1.71
diff -c -c -r1.71 fastpath.c
*** src/backend/tcop/fastpath.c 7 Jan 2004 18:56:27 -0000 1.71
--- src/backend/tcop/fastpath.c 19 Apr 2004 17:07:51 -0000
***************
*** 154,161 ****
bool typisvarlena;
char *outputstr;
! getTypeOutputInfo(rettype,
! &typoutput, &typelem, &typisvarlena);
outputstr = DatumGetCString(OidFunctionCall3(typoutput,
retval,
ObjectIdGetDatum(typelem),
--- 154,160 ----
bool typisvarlena;
char *outputstr;
! getTypeOutputInfo(rettype, &typoutput, &typelem, &typisvarlena);
outputstr = DatumGetCString(OidFunctionCall3(typoutput,
retval,
ObjectIdGetDatum(typelem),
Index: src/bin/psql/copy.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/psql/copy.c,v
retrieving revision 1.43
diff -c -c -r1.43 copy.c
*** src/bin/psql/copy.c 12 Apr 2004 15:58:52 -0000 1.43
--- src/bin/psql/copy.c 19 Apr 2004 17:07:51 -0000
***************
*** 66,73 ****
--- 66,78 ----
bool from;
bool binary;
bool oids;
+ bool csv_mode;
char *delim;
char *null;
+ char *quote;
+ char *escape;
+ char *force_list;
+ char *literal_list;
};
***************
*** 81,86 ****
--- 86,95 ----
free(ptr->file);
free(ptr->delim);
free(ptr->null);
+ free(ptr->quote);
+ free(ptr->escape);
+ free(ptr->force_list);
+ free(ptr->literal_list);
free(ptr);
}
***************
*** 272,282 ****
--- 281,299 ----
while (token)
{
+ bool fetch_next;
+
+ fetch_next = true;
+
/* someday allow BINARY here */
if (strcasecmp(token, "oids") == 0)
{
result->oids = true;
}
+ else if (strcasecmp(token, "csv") == 0)
+ {
+ result->csv_mode = true;
+ }
else if (strcasecmp(token, "delimiter") == 0)
{
token = strtokx(NULL, whitespace, NULL, "'",
***************
*** 301,311 ****
else
goto error;
}
else
goto error;
! token = strtokx(NULL, whitespace, NULL, NULL,
! 0, false, pset.encoding);
}
}
--- 318,395 ----
else
goto error;
}
+ else if (strcasecmp(token, "quote") == 0)
+ {
+ token = strtokx(NULL, whitespace, NULL, "'",
+ '\\', false, pset.encoding);
+ if (token && strcasecmp(token, "as") == 0)
+ token = strtokx(NULL, whitespace, NULL, "'",
+ '\\', false, pset.encoding);
+ if (token)
+ result->quote = pg_strdup(token);
+ else
+ goto error;
+ }
+ else if (strcasecmp(token, "escape") == 0)
+ {
+ token = strtokx(NULL, whitespace, NULL, "'",
+ '\\', false, pset.encoding);
+ if (token && strcasecmp(token, "as") == 0)
+ token = strtokx(NULL, whitespace, NULL, "'",
+ '\\', false, pset.encoding);
+ if (token)
+ result->escape = pg_strdup(token);
+ else
+ goto error;
+ }
+ else if (strcasecmp(token, "force") == 0)
+ {
+ /* handle column list */
+ fetch_next = false;
+ for (;;)
+ {
+ token = strtokx(NULL, whitespace, ",", "\"",
+ 0, false, pset.encoding);
+ if (!token || strchr(",", token[0]))
+ goto error;
+ if (!result->force_list)
+ result->force_list = pg_strdup(token);
+ else
+ xstrcat(&result->force_list, token);
+ token = strtokx(NULL, whitespace, ",", "\"",
+ 0, false, pset.encoding);
+ if (!token || token[0] != ',')
+ break;
+ xstrcat(&result->force_list, token);
+ }
+ }
+ else if (strcasecmp(token, "literal") == 0)
+ {
+ /* handle column list */
+ fetch_next = false;
+ for (;;)
+ {
+ token = strtokx(NULL, whitespace, ",", "\"",
+ 0, false, pset.encoding);
+ if (!token || strchr(",", token[0]))
+ goto error;
+ if (!result->literal_list)
+ result->literal_list = pg_strdup(token);
+ else
+ xstrcat(&result->literal_list, token);
+ token = strtokx(NULL, whitespace, ",", "\"",
+ 0, false, pset.encoding);
+ if (!token || token[0] != ',')
+ break;
+ xstrcat(&result->literal_list, token);
+ }
+ }
else
goto error;
! if (fetch_next)
! token = strtokx(NULL, whitespace, NULL, NULL,
! 0, false, pset.encoding);
}
}
***************
*** 340,346 ****
PGresult *result;
bool success;
struct stat st;
!
/* parse options */
options = parse_slash_copy(args);
--- 424,430 ----
PGresult *result;
bool success;
struct stat st;
!
/* parse options */
options = parse_slash_copy(args);
***************
*** 379,390 ****
--- 463,506 ----
options->delim);
}
+ /* There is no backward-compatible CSV syntax */
if (options->null)
{
if (options->null[0] == '\'')
appendPQExpBuffer(&query, " WITH NULL AS %s", options->null);
else
appendPQExpBuffer(&query, " WITH NULL AS '%s'", options->null);
+ }
+
+ if (options->csv_mode)
+ {
+ appendPQExpBuffer(&query, " CSV");
+ }
+
+ if (options->quote)
+ {
+ if (options->quote[0] == '\'')
+ appendPQExpBuffer(&query, " QUOTE AS %s", options->quote);
+ else
+ appendPQExpBuffer(&query, " QUOTE AS '%s'", options->quote);
+ }
+
+ if (options->escape)
+ {
+ if (options->escape[0] == '\'')
+ appendPQExpBuffer(&query, " ESCAPE AS %s", options->escape);
+ else
+ appendPQExpBuffer(&query, " ESCAPE AS '%s'", options->escape);
+ }
+
+ if (options->force_list)
+ {
+ appendPQExpBuffer(&query, " FORCE %s", options->force_list);
+ }
+
+ if (options->literal_list)
+ {
+ appendPQExpBuffer(&query, " LITERAL %s", options->literal_list);
}
if (options->from)
Bruce Momjian wrote:
I have applied the attached patch that complete TODO item:
o -Allow dump/load of CSV format
This adds new keywords to COPY and \copy:
CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columns
FORCE QUOTE
LITERAL - prevent NULL checks for specific columns
NO NULL CHECK
Show quoted text
If folks have better ideas for naming of those keywords, I am all
ears!
[ discussion moved to hackers.]
Peter Eisentraut wrote:
Bruce Momjian wrote:
I have applied the attached patch that complete TODO item:
o -Allow dump/load of CSV format
This adds new keywords to COPY and \copy:
CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columnsFORCE QUOTE
LITERAL - prevent NULL checks for specific columns
NO NULL CHECK
I considered going this direction, but it broke the WITH clause style of
COPY. Previously it was "WITH keyword value". Now it is also "WITH
keyword value, value" too. This would add "WITH keyword keyword value,
value".
It would change:
COPY pg_language TO '/tmp/x' WITH CSV FORCE lanname
LITERAL lanacl
to:
COPY pg_language TO '/tmp/x' WITH CSV FORCE QUOTE lanname NO
NULL CHECK lanacl
If folks like it, I can make the change.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Peter Eisentraut wrote:
Bruce Momjian wrote:
I have applied the attached patch that complete TODO item:
o -Allow dump/load of CSV format
This adds new keywords to COPY and \copy:
CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columnsFORCE QUOTE
LITERAL - prevent NULL checks for specific columns
NO NULL CHECK
If folks have better ideas for naming of those keywords, I am all
ears!
Bruce and I tossed this around quite a bit.
The problem is that using QUOTE or NULL in these phrases might look
confusing, e.g.
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;
I also don't think NO NULL CHECK actually matches the use case for this
very well (and I'm dubious about LITERAL too). They both describe the
actual behaviour, but not what you are trying to achieve. Essentially,
this would be used when you have a field with a NOT NULL constraint, but
the input CSV data stream has what would otherwise be considered nulls.
(COPY itself will never produce such a CSV, as non-null values that
resemble null are always quoted, but third party programs well might.)
So an alternative might be FORCE NOT NULL, but for the previous
consideration. Perhaps use of an optional preposition might make things
slightly clearer, e.g.:
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
field1,field2;
But it does start to look a little too much like COBOL :-).
So I'm interested to see if there are any other inspirations people have.
cheers
andrew
Bruce Momjian wrote:
[ discussion moved to hackers.]
Peter Eisentraut wrote:
Bruce Momjian wrote:
I have applied the attached patch that complete TODO item:
o -Allow dump/load of CSV format
This adds new keywords to COPY and \copy:
CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columnsFORCE QUOTE
LITERAL - prevent NULL checks for specific columns
NO NULL CHECK
I considered going this direction, but it broke the WITH clause style of
COPY. Previously it was "WITH keyword value". Now it is also "WITH
keyword value, value" too. This would add "WITH keyword keyword value,
value".It would change:
COPY pg_language TO '/tmp/x' WITH CSV FORCE lanname
LITERAL lanacl
to:
COPY pg_language TO '/tmp/x' WITH CSV FORCE QUOTE lanname NO
NULL CHECK lanaclIf folks like it, I can make the change.
[reposting my comments from patches]
The problem is that using QUOTE or NULL in these phrases might look
confusing, e.g.
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;
I also don't think NO NULL CHECK actually matches the use case for this
very well (and I'm dubious about LITERAL too). They both describe the
actual behaviour, but not what you are trying to achieve. Essentially,
this would be used when you have a field with a NOT NULL constraint, but
the input CSV data stream has what would otherwise be considered nulls.
(COPY itself will never produce such a CSV, as non-null values that
resemble null are always quoted, but third party programs well might.)
So an alternative might be FORCE NOT NULL, but for the previous
consideration. Perhaps use of an optional preposition might make things
slightly clearer, e.g.:
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
field1,field2;
But it does start to look a little too much like COBOL .
So I'm interested to see if there are any other inspirations people have.
cheers
andrew
CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columnsFORCE QUOTE
QUOTING col1,col2?
QUOTED col1,col2?
IN QUOTES col1,cold
LITERAL - prevent NULL checks for specific columns
NO NULL CHECK
QUOTED (meaning 'as quoted')?
From a language design point of view, I think it may be better to stick
to one word versions?
--
Fabien Coelho - coelho@cri.ensmp.fr
Andrew Dunstan wrote:
Bruce and I tossed this around quite a bit.
The problem is that using QUOTE or NULL in these phrases might look
confusing, e.g.
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;I also don't think NO NULL CHECK actually matches the use case for this
very well (and I'm dubious about LITERAL too). They both describe the
actual behaviour, but not what you are trying to achieve. Essentially,
this would be used when you have a field with a NOT NULL constraint, but
the input CSV data stream has what would otherwise be considered nulls.
(COPY itself will never produce such a CSV, as non-null values that
resemble null are always quoted, but third party programs well might.)
So an alternative might be FORCE NOT NULL, but for the previous
consideration. Perhaps use of an optional preposition might make things
slightly clearer, e.g.:COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
field1,field2;But it does start to look a little too much like COBOL :-).
Yea, and as I remember, that COBOL word scared us off. :-)
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Peter Eisentraut wrote:
Bruce Momjian wrote:
LITERAL - prevent NULL checks for specific columns
NO NULL CHECK
I considered going this direction, but it broke the WITH clause style of
COPY.
Huh? Many of the other options are already potentially two words,
eg QUOTE [AS], so I don't see how you can object to that aspect.
I agree with Peter that "LITERAL" is a completely unhelpful name for the
functionality.
"NO NULL CHECK" seems a bit confusing as well --- at first glance it
looks like it might mean overriding the column's NOT NULL attribute
(which I trust is not what it means ;-)). Perhaps "NO NULLS" ?
Or just "NOT NULL", to avoid introducing another keyword.
I liked the "FORCE QUOTE" suggestion, too.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Peter Eisentraut wrote:
Bruce Momjian wrote:
LITERAL - prevent NULL checks for specific columns
NO NULL CHECK
I considered going this direction, but it broke the WITH clause style of
COPY.Huh? Many of the other options are already potentially two words,
eg QUOTE [AS], so I don't see how you can object to that aspect.
AS was a noise word, while the new keywords would not be.
I agree with Peter that "LITERAL" is a completely unhelpful name for the
functionality.
Yep.
"NO NULL CHECK" seems a bit confusing as well --- at first glance it
looks like it might mean overriding the column's NOT NULL attribute
(which I trust is not what it means ;-)). Perhaps "NO NULLS" ?
Right. It doesn't affect the constraint.
Or just "NOT NULL", to avoid introducing another keyword.
Yes, I liked that, but to me it looked like we were saying the column
was NOT NULL, which isn't the meaning. You could almost call it QUOTE
NULL because it is actually quoting the NULL strings in those columns.
I liked the "FORCE QUOTE" suggestion, too.
OK.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
I wrote:
The problem is that using QUOTE or NULL in these phrases might look
confusing, e.g.
COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV NO NULL CHECK field1,field2;I also don't think NO NULL CHECK actually matches the use case for
this very well (and I'm dubious about LITERAL too). They both
describe the actual behaviour, but not what you are trying to achieve.
Essentially, this would be used when you have a field with a NOT NULL
constraint, but the input CSV data stream has what would otherwise be
considered nulls. (COPY itself will never produce such a CSV, as
non-null values that resemble null are always quoted, but third party
programs well might.) So an alternative might be FORCE NOT NULL, but
for the previous consideration. Perhaps use of an optional preposition
might make things slightly clearer, e.g.:COPY mytable TO 'mytable.csv' CSV QUOTE '"' FORCE QUOTE IN
field1,field2;
COPY mytable FROM 'mytable.csv' NULL '' CSV FORCE NOT NULL IN
field1,field2;But it does start to look a little too much like COBOL .
So I'm interested to see if there are any other inspirations people have.
The other alternative for the NOT NULL side would be to abandon it and
tell users they would need to use a trigger. That requires a little more
work from them, but would work in the general case, whereas this one is
likely to fail on everything but a text-and-friends column. Example:
andrew=# create table b(a int not null default 0);
CREATE TABLE
andrew=# create function bnull() returns trigger language plpgsql as $$
andrew$# begin
andrew$# if NEW.a is null then NEW.a := 0; end if;
andrew$# return NEW;
andrew$# end;
andrew$# $$;
CREATE FUNCTION
andrew=# create trigger btrigger before insert on b FOR EACH ROW EXECUTE
PROCEDURE bnull();
CREATE TRIGGER
andrew=# copy b from stdin csv;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
\.
andrew=# select * from b
andrew-# ;
a
---
0
(1 row)
But there isn't a reasonable alternative to making the user make choices
on the output side (see previous discussion regarding zip codes).
cheers
andrew
Fabien COELHO wrote:
CSV - enable CSV mode
QUOTE - specify quote character
ESCAPE - specify escape character
FORCE - force quoting of specified columnsFORCE QUOTE
QUOTING col1,col2?
QUOTED col1,col2?
IN QUOTES col1,coldLITERAL - prevent NULL checks for specific columns
NO NULL CHECK
QUOTED (meaning 'as quoted')?
From a language design point of view, I think it may be better to stick
to one word versions?
I understand, and we tried that. The confusion is that FORCE forces
quotes on non-NULL values, while LITERAL forces quotes on NULL values,
so while both deal with quoting, their functionality is different for
input and output. They same single keyword just seemed confusing.
This is why the two-keyword idea sounds good to me --- we use the word
QUOTE (already a keyword), and then say FORCE for output, and NULL for
input:
FORCE QUOTE
QUOTE NULL
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073