Patch: FORCE_NULL option for copy COPY in CSV mode
Hi,
This patch implements the following TODO item:
Allow COPY in CSV mode to control whether a quoted zero-length
string is treated as NULL
Currently this is always treated as a zero-length string,
which generates an error when loading into an integer column
Re: [PATCHES] allow CSV quote in NULL
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php
http://wiki.postgresql.org/wiki/Todo#COPY
I had a very definite use-case for this functionality recently while importing
CSV files generated by Oracle, and was somewhat frustrated by the existence
of a FORCE_NOT_NULL option for specific columns, but not one for
FORCE_NULL.
I'll add this to the November commitfest.
Regards
Ian Barwick
Attachments:
copy-force-null-v1.patchapplication/octet-stream; name=copy-force-null-v1.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
new file mode 100644
index 1ecc939..0038d03
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 42,47 ****
--- 42,48 ----
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
+ FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 329,334 ****
--- 330,347 ----
</varlistentry>
<varlistentry>
+ <term><literal>FORCE_NULL</></term>
+ <listitem>
+ <para>
+ Force the specified columns' values to be converted to <literal>NULL</>
+ if the value contains an empty string.
+ This option is allowed only in <command>COPY FROM</>, and only when
+ using <literal>CSV</> format.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>ENCODING</></term>
<listitem>
<para>
*************** COPY <replaceable class="parameter">coun
*** 637,643 ****
string, while an empty string data value is written with double quotes
(<literal>""</>). Reading values follows similar rules. You can
use <literal>FORCE_NOT_NULL</> to prevent <literal>NULL</> input
! comparisons for specific columns.
</para>
<para>
--- 650,658 ----
string, while an empty string data value is written with double quotes
(<literal>""</>). Reading values follows similar rules. You can
use <literal>FORCE_NOT_NULL</> to prevent <literal>NULL</> input
! comparisons for specific columns. Alternatively you can use
! <literal>FORCE_NULL</> to convert empty string data values to
! <literal>NULL</>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 6b20144..40a37a8
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** typedef struct CopyStateData
*** 125,130 ****
--- 125,132 ----
bool *force_quote_flags; /* per-column CSV FQ flags */
List *force_notnull; /* list of column names */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ List *force_null; /* list of column names */
+ bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
List *convert_select; /* list of column names (can be NIL) */
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
*************** ProcessCopyOptions(CopyState cstate,
*** 1019,1024 ****
--- 1021,1040 ----
errmsg("argument to option \"%s\" must be a list of column names",
defel->defname)));
}
+ else if (strcmp(defel->defname, "force_null") == 0)
+ {
+ if (cstate->force_null)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ if (defel->arg && IsA(defel->arg, List))
+ cstate->force_null = (List *) defel->arg;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a list of column names",
+ defel->defname)));
+ }
else if (strcmp(defel->defname, "convert_selectively") == 0)
{
/*
*************** ProcessCopyOptions(CopyState cstate,
*** 1178,1183 ****
--- 1194,1210 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY force not null only available using COPY FROM")));
+ /* Check force_null */
+ if (!cstate->csv_mode && cstate->force_null != NIL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force null available only in CSV mode")));
+
+ if (cstate->force_notnull != NIL && !is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force null only available using COPY FROM")));
+
/* Don't allow the delimiter to appear in the null string. */
if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
ereport(ERROR,
*************** BeginCopy(bool is_from,
*** 1385,1390 ****
--- 1412,1439 ----
}
}
+ /* Convert FORCE NULL name list to per-column flags, check validity */
+ cstate->force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
+ if (cstate->force_null)
+ {
+ List *attnums;
+ ListCell *cur;
+
+ attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->force_null);
+
+ foreach(cur, attnums)
+ {
+ int attnum = lfirst_int(cur);
+
+ if (!list_member_int(cstate->attnumlist, attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("FORCE NULL column \"%s\" not referenced by COPY",
+ NameStr(tupDesc->attrs[attnum - 1]->attname))));
+ cstate->force_null_flags[attnum - 1] = true;
+ }
+ }
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->convert_selectively)
{
*************** NextCopyFrom(CopyState cstate, ExprConte
*** 2795,2805 ****
continue;
}
! if (cstate->csv_mode && string == NULL &&
! cstate->force_notnull_flags[m])
{
! /* Go ahead and read the NULL string */
! string = cstate->null_print;
}
cstate->cur_attname = NameStr(attr[m]->attname);
--- 2844,2867 ----
continue;
}
! if (cstate->csv_mode)
{
! if(string == NULL &&
! cstate->force_notnull_flags[m])
! {
! /* FORCE_NOT_NULL option is set and column is NULL -
! convert it to an empty string
! */
! string = cstate->null_print;
! }
! else if(string != NULL && strlen(string) == 0 &&
! cstate->force_null_flags[m])
! {
! /* FORCE_NULL option is set and column is an empty string -
! convert it to NULL
! */
! string = NULL;
! }
}
cstate->cur_attname = NameStr(attr[m]->attname);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
new file mode 100644
index a9812af..ca57ab7
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 2493,2498 ****
--- 2493,2502 ----
{
$$ = makeDefElem("force_not_null", (Node *)$4);
}
+ | FORCE NULL_P columnList
+ {
+ $$ = makeDefElem("force_null", (Node *)$3);
+ }
| ENCODING Sconst
{
$$ = makeDefElem("encoding", (Node *)makeString($2));
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
new file mode 100644
index 34fa131..de99108
*** a/src/test/regress/expected/copy2.out
--- b/src/test/regress/expected/copy2.out
*************** SELECT * FROM vistest;
*** 382,387 ****
--- 382,435 ----
e
(2 rows)
+ -- Test FORCE_NOT_NULL and FORCE_NULL options
+ -- should succeed with "b" set to an empty string and "c" set to NULL
+ CREATE TEMP TABLE forcetest (
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT,
+ e TEXT
+ );
+ \pset null NULL
+ BEGIN;
+ COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+ COMMIT;
+ SELECT b, c FROM forcetest WHERE a = 1;
+ b | c
+ ---+------
+ | NULL
+ (1 row)
+
+ -- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
+ BEGIN;
+ COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+ COMMIT;
+ SELECT b, c FROM forcetest WHERE a = 2;
+ b | c
+ ---+------
+ | NULL
+ (1 row)
+
+ -- should fail with not-null constraint violiaton
+ BEGIN;
+ COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL(b), FORCE_NOT_NULL(c));
+ ERROR: null value in column "b" violates not-null constraint
+ DETAIL: Failing row contains (3, null, , null, null).
+ CONTEXT: COPY forcetest, line 1: "3,,"""
+ ROLLBACK;
+ -- should fail with "not referenced by COPY" error
+ BEGIN;
+ COPY forcetest (d, e) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b));
+ ERROR: FORCE NOT NULL column "b" not referenced by COPY
+ ROLLBACK;
+ -- should fail with "not referenced by COPY" error
+ BEGIN;
+ COPY forcetest (d, e) FROM STDIN WITH (FORMAT csv, FORCE_NULL(b));
+ ERROR: FORCE NULL column "b" not referenced by COPY
+ ROLLBACK;
+ \pset null ''
+ DROP TABLE forcetest;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
new file mode 100644
index c46128b..b417cf7
*** a/src/test/regress/sql/copy2.sql
--- b/src/test/regress/sql/copy2.sql
*************** e
*** 270,275 ****
--- 270,314 ----
SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
+ -- Test FORCE_NOT_NULL and FORCE_NULL options
+ -- should succeed with "b" set to an empty string and "c" set to NULL
+ CREATE TEMP TABLE forcetest (
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT,
+ e TEXT
+ );
+ \pset null NULL
+ BEGIN;
+ COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+ 1,,""
+ \.
+ COMMIT;
+ SELECT b, c FROM forcetest WHERE a = 1;
+ -- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
+ BEGIN;
+ COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+ 2,,""
+ \.
+ COMMIT;
+ SELECT b, c FROM forcetest WHERE a = 2;
+ -- should fail with not-null constraint violiaton
+ BEGIN;
+ COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL(b), FORCE_NOT_NULL(c));
+ 3,,""
+ \.
+ ROLLBACK;
+ -- should fail with "not referenced by COPY" error
+ BEGIN;
+ COPY forcetest (d, e) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b));
+ ROLLBACK;
+ -- should fail with "not referenced by COPY" error
+ BEGIN;
+ COPY forcetest (d, e) FROM STDIN WITH (FORMAT csv, FORCE_NULL(b));
+ ROLLBACK;
+ \pset null ''
+ DROP TABLE forcetest;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
On Sun, Sep 29, 2013 at 1:39 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
Hi,
This patch implements the following TODO item:
Allow COPY in CSV mode to control whether a quoted zero-length
string is treated as NULLCurrently this is always treated as a zero-length string,
which generates an error when loading into an integer columnRe: [PATCHES] allow CSV quote in NULL
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.phphttp://wiki.postgresql.org/wiki/Todo#COPY
I had a very definite use-case for this functionality recently while importing
CSV files generated by Oracle, and was somewhat frustrated by the existence
of a FORCE_NOT_NULL option for specific columns, but not one for
FORCE_NULL.
While going through documentation of this patch to understand it's
usage, I found a small mistake.
+ Force the specified columns' values to be converted to <literal>NULL</>
+ if the value contains an empty string.
It seems quote after columns is wrong.
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:
postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
50,
\.
postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 5, 2013 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Sep 29, 2013 at 1:39 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
Hi,
This patch implements the following TODO item:
Allow COPY in CSV mode to control whether a quoted zero-length
string is treated as NULLCurrently this is always treated as a zero-length string,
which generates an error when loading into an integer columnRe: [PATCHES] allow CSV quote in NULL
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.phphttp://wiki.postgresql.org/wiki/Todo#COPY
I had a very definite use-case for this functionality recently while importing
CSV files generated by Oracle, and was somewhat frustrated by the existence
of a FORCE_NOT_NULL option for specific columns, but not one for
FORCE_NULL.While going through documentation of this patch to understand it's
usage, I found a small mistake.+ Force the specified columns' values to be converted to <literal>NULL</> + if the value contains an empty string.It seems quote after columns is wrong.
That's a correct plural possessive in English, but in might be better
worded as "Force any empty string encountered in the input for the
specified columns to be interpreted as a NULL value."
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)
Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.
Isn't the point of this option to allow a *quoted* empty string to be
forced to NULL? If so, this is not testing the same case - in fact the
COPY command above just makes explicit the default CSV NULL setting anyway.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be forced
to NULL? If so, this is not testing the same case - in fact the COPY command
above just makes explicit the default CSV NULL setting anyway.
I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.
By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be forced
to NULL? If so, this is not testing the same case - in fact the COPY command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.
Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch
allows you to specify the behaviour very explicitly on a per column
basis. I can well imagine wanting to be able to force a quoted empty
string to null for numeric fields but not for text.
The basic principal of our CSV processing is that we don't ever turn a
NULL into something quoted and we don't ever turn something quoted into
NULL. That's what lets us round-trip test just about every combination
of options. I'm only going to be happy violating that, as this patch
does, in a very explicit and controlled way.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 8, 2013 at 8:33 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.
Andrew, are you planning to review & commit this?
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/08/2013 12:30 PM, Robert Haas wrote:
On Tue, Oct 8, 2013 at 8:33 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Andrew, are you planning to review & commit this?
Yes.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/08/2013 12:47 PM, Andrew Dunstan wrote:
On 10/08/2013 12:30 PM, Robert Haas wrote:
Andrew, are you planning to review & commit this?
Yes.
Incidentally, the patch doesn't seem to add the option to file_fdw,
which I really think it should.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/10/9 Andrew Dunstan <andrew@dunslane.net>:
On 10/08/2013 12:47 PM, Andrew Dunstan wrote:
On 10/08/2013 12:30 PM, Robert Haas wrote:
Andrew, are you planning to review & commit this?
Yes.
Incidentally, the patch doesn't seem to add the option to file_fdw, which I
really think it should.
Patch author here. I'll dig out the use-case I had for this patch and have a
look at the file_fdw option, which never occurred to me. (I'm
doing some FDW-related stuff at the moment so would be more than happy
to handle that too).
(Sorry for the delay in following up on this, I kind of assumed the patch
would be squirrelled away until the next commitfest ;) )
Regards
Ian Barwick
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.
The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.
Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?
It seems from documentation that current option FORCE_NOT_NULL handles
for both (Do not match the specified columns' values against the null
string. In the default case where the null string is empty, this means
that empty values will be read as zero-length strings rather than
nulls, "even when they are not quoted.").
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.
Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.
The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?
I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.
I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 09:22 AM, Amit Kapila wrote:
On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.
I really don't know what you're saying here.
Here is the situation we have today (assuming the default null marker of
empty-string):
default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring, quoted-empty-string
-> emptystring
and the proposal would add to that:
with force-null: empty-string -> null, quoted-empty-string -> null
So it appears to be quite on all fours with the way force_not_null works
now, it just does the reverse.
I don't see at all that your suggested alternative has any advantages
over what's been written. If you can say "NULL FOR (foo) as '""' how
will you specify the null for some other column(s)? Are we going to have
multiple such clauses? It looks like a real mess.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
I don't see at all that your suggested alternative has any advantages over
what's been written. If you can say "NULL FOR (foo) as '""' how will you
specify the null for some other column(s)? Are we going to have multiple
such clauses? It looks like a real mess.
Basically the CSV files don't have out-of-band NULLs and it's then a
real mess. In the new pgloader version I've been adding per-column NULL
processing, where NULL can be either an empty string, any number of
space characters or any constant string such as "\N" or "****".
I first added a global per-file NULL representation setting, but that's
not flexible enough to make any sense really. The files we have to
import are way to "creative" in their formats.
In my view, we can slowly deprecate pgloader by including such features
in the core code or make pgloader and the like non-optional parts of
external data loading tool chain.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 11:23 AM, Dimitri Fontaine wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I don't see at all that your suggested alternative has any advantages over
what's been written. If you can say "NULL FOR (foo) as '""' how will you
specify the null for some other column(s)? Are we going to have multiple
such clauses? It looks like a real mess.Basically the CSV files don't have out-of-band NULLs and it's then a
real mess. In the new pgloader version I've been adding per-column NULL
processing, where NULL can be either an empty string, any number of
space characters or any constant string such as "\N" or "****".I first added a global per-file NULL representation setting, but that's
not flexible enough to make any sense really. The files we have to
import are way to "creative" in their formats.In my view, we can slowly deprecate pgloader by including such features
in the core code or make pgloader and the like non-optional parts of
external data loading tool chain.
The CSV code was somewhat controversial when adopted, and was never
intended to cater for all cases. I think it was accepted because it gave
good coverage of a large number of common cases without huge additional
code complexity. I think we drew the line in about the right place for
what we support, although we've extended it modestly over the years. I
seriously doubt that it will ever fully replace a utility like pgloader,
and I'm not sure that's a desirable goal in the first place.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 09, 2013 at 09:52:29AM -0400, Andrew Dunstan wrote:
On 10/09/2013 09:22 AM, Amit Kapila wrote:
On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.I really don't know what you're saying here.
Here is the situation we have today (assuming the default null
marker of empty-string):default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring,
quoted-empty-string -> emptystringand the proposal would add to that:
with force-null: empty-string -> null, quoted-empty-string -> null
So it appears to be quite on all fours with the way force_not_null
works now, it just does the reverse.I don't see at all that your suggested alternative has any
advantages over what's been written. If you can say "NULL FOR (foo)
as '""' how will you specify the null for some other column(s)?
Idea:
NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,
Are we going to have multiple such clauses? It looks like a real
mess.
Is that not part of what parsers ordinarily do?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 01:25 PM, David Fetter wrote:
Idea:
NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,
What's the point of this? How is this superior to what is currently
proposed? Having arbitrary NULL markers for different fields will
significantly increase code complexity for a case I have a hard time
believing exists to any significant degree in the real world. The ONLY
case I know of outside some fervid imaginations where we need to
distinguish between different NULL treatment is quoted vs unquoted,
which this patch rounds out. Catering for anything else seems quite
unnecessary.
What is more, I think it's actually far more obscure than what is
proposed, which fits in nicely with our existing pattern of options.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 9, 2013 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
I really don't know what you're saying here.
Here is the situation we have today (assuming the default null marker of
empty-string):default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring, quoted-empty-string ->
emptystringand the proposal would add to that:
with force-null: empty-string -> null, quoted-empty-string -> null
So it appears to be quite on all fours with the way force_not_null works
now, it just does the reverse.
That principle seems sound as far as it goes, but somehow I feel we've
dug ourselves into a hole here naming-wise. Apparently an unquoted
empty string is normally null, but you can use force-not-null to make
it an empty string instead. And a quoted empty string is normally an
empty string, but you can use force-null to make it a null instead.
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.
An unsuspecting user confronted with a column marked with both of
those options at the same time might be a bit perplexed.
It seems to me that it might be better to have an option called
empty_input, and then you could have four values with names we can
bikeshed about - e.g. auto (the current default behavior), null
(proposed force null), empty_string (current force not null), reversed
(force null + force not null).
(In the interest of full disclosure, there is an EDB product that as
of recently offers something very much like this, with slightly
different naming and offering only the first three of those four
options, motivated by a customer complaint about the default behavior,
which was the same as COPY's default behavior. I don't know that the
solution we adopted there has any bearing on what ought to be done
here, and I can certainly live with it if people prefer to have FORCE
NULL and FORCE NOT NULL with not-quite-opposite meanings, but I do
think it's sort of hilariously awful, right up there with constraint
exclusion vs. exclusion constraints.)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 03:25 PM, Robert Haas wrote:
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.
Is there a real world example of this case? How common is it? And how
come I haven't heard of it in the nine or so years since we've been
supporting CSV import?
Frankly it strikes me as more than a little perverse to have a CSV file
where you want an unquoted empty string to map to emptystring but a
quoted one to map to null, especially if you want that for the same
field. The main two problems I have heard people people complain about are:
* some CSV producers quote everything, even NULL values. This is
particularly true where the producer doesn't treat NULLs the same
way we do. This patch will fix that, allowing you to get a null on a
column by column basis as desired.
* some tables have NOT NULL constraints that are violated by unquoted
empty strings being interpreted as NULL - that was fixed long ago by
FORCE NOT NULL.
It's faintly possible you might encounter both of these problems in some
form in the one file, but not very likely.
Regarding syntax suggestions - this is not a green field. If we were
designing the syntax of COPY from scratch today we might make other
decisions than those that were made back in 2004. But I don't think we
can add new options using a quite different style from what's already
been done. That would look more than odd.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 9, 2013 at 5:43 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/09/2013 03:25 PM, Robert Haas wrote:
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.Is there a real world example of this case? How common is it? And how come I
haven't heard of it in the nine or so years since we've been supporting CSV
import?
I doubt there is any real world use case. My point is just that FORCE
NULL and FORCE NOT NULL are not opposites, and that will probably
confuse some people.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 08:39 PM, Robert Haas wrote:
On Wed, Oct 9, 2013 at 5:43 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/09/2013 03:25 PM, Robert Haas wrote:
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.Is there a real world example of this case? How common is it? And how come I
haven't heard of it in the nine or so years since we've been supporting CSV
import?I doubt there is any real world use case. My point is just that FORCE
NULL and FORCE NOT NULL are not opposites, and that will probably
confuse some people.
I'm not wedded to the exact syntax. If you think this will lead to
confusion we could call it QUOTED NULL or some such. Doing it cleanly
without adding a new key word could stretch the imagination some.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 9, 2013 at 7:22 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/09/2013 09:22 AM, Amit Kapila wrote:
On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per
above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch
allows
you to specify the behaviour very explicitly on a per column basis. I
can
well imagine wanting to be able to force a quoted empty string to null
for
numeric fields but not for text.Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.The basic principal of our CSV processing is that we don't ever turn a
NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of
options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.I really don't know what you're saying here.
Here is the situation we have today (assuming the default null marker of
empty-string):default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring, quoted-empty-string ->
emptystringand the proposal would add to that:
with force-null: empty-string -> null, quoted-empty-string -> null
So it appears to be quite on all fours with the way force_not_null works
now, it just does the reverse.I don't see at all that your suggested alternative has any advantages over
what's been written. If you can say "NULL FOR (foo) as '""' how will you
specify the null for some other column(s)? Are we going to have multiple
such clauses?
No, if user wants for particular string (ex. quoted empty string)
that few of it's columns becomes NULL, he can specify them together (
NULL for (col1, col2)).
It looks like a real mess.
One of the advantage, I could see using "NULL For .." syntax is
that already we have one syntax with which user can specify what
strings can be replaced with NULL, now just to handle quoted empty
string why to add different syntax.
"FORCE_NULL" has advantage that it can be used for some columns rather
than all columns, but I think for that existing syntax can also be
modified to support it.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 11:47 PM, Amit Kapila wrote:
One of the advantage, I could see using "NULL For .." syntax is
that already we have one syntax with which user can specify what
strings can be replaced with NULL, now just to handle quoted empty
string why to add different syntax."FORCE_NULL" has advantage that it can be used for some columns rather
than all columns, but I think for that existing syntax can also be
modified to support it.
I think it's badly designed on its face. We don't need and shouldn't
provide a facility for different NULL markers. A general facility for
that would be an ugly an quite pointless addition to code complexity.
What we need is simply a way of altering one specific behaviour, namely
the treatment of quoted NULL markers. We should not do that by allowing
munging the NULL marker per column, but by a syntactical mechanism that
directly addresses the change in behaviour. If you don't like "FORCE
NULL" then let's pick something else, but not this ugly and unnecessary
"NULL FOR" gadget.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 10, 2013 at 6:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/09/2013 11:47 PM, Amit Kapila wrote:
One of the advantage, I could see using "NULL For .." syntax is
that already we have one syntax with which user can specify what
strings can be replaced with NULL, now just to handle quoted empty
string why to add different syntax."FORCE_NULL" has advantage that it can be used for some columns rather
than all columns, but I think for that existing syntax can also be
modified to support it.I think it's badly designed on its face. We don't need and shouldn't
provide a facility for different NULL markers. A general facility for that
would be an ugly an quite pointless addition to code complexity. What we
need is simply a way of altering one specific behaviour, namely the
treatment of quoted NULL markers. We should not do that by allowing munging
the NULL marker per column,
I was thinking it to similar in some sense with "insert into tbl"
statement. For example
Create table tbl (c1 int, c2 int, c3 int, c4 int)
insert into tbl (col2) values(1);
Here after table name, user can specify column names for which he
wants to provide specific values.
but by a syntactical mechanism that directly
addresses the change in behaviour. If you don't like "FORCE NULL" then let's
pick something else, but not this ugly and unnecessary "NULL FOR" gadget.
If you don't like idea of one generic syntax for NULLs in COPY
command, then "FORCE_QUOTED_NULL" or "QUOTED_NULL" will make more
sense as compare to FORCE_NULL.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 11, 2013 at 12:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 10, 2013 at 6:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/09/2013 11:47 PM, Amit Kapila wrote:
One of the advantage, I could see using "NULL For .." syntax is
that already we have one syntax with which user can specify what
strings can be replaced with NULL, now just to handle quoted empty
string why to add different syntax."FORCE_NULL" has advantage that it can be used for some columns rather
than all columns, but I think for that existing syntax can also be
modified to support it.I think it's badly designed on its face. We don't need and shouldn't
provide a facility for different NULL markers. A general facility for that
would be an ugly an quite pointless addition to code complexity. What we
need is simply a way of altering one specific behaviour, namely the
treatment of quoted NULL markers. We should not do that by allowing munging
the NULL marker per column,I was thinking it to similar in some sense with "insert into tbl"
statement. For example
Create table tbl (c1 int, c2 int, c3 int, c4 int)
insert into tbl (col2) values(1);Here after table name, user can specify column names for which he
wants to provide specific values.but by a syntactical mechanism that directly
addresses the change in behaviour. If you don't like "FORCE NULL" then let's
pick something else, but not this ugly and unnecessary "NULL FOR" gadget.If you don't like idea of one generic syntax for NULLs in COPY
command, then "FORCE_QUOTED_NULL" or "QUOTED_NULL" will make more
sense as compare to FORCE_NULL.
Meh. As amused as I am with our bad naming, I don't think there's
anything too terribly wrong with FORCE NULL. Symmetry has some value.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers