Patch: FORCE_NULL option for copy COPY in CSV mode
Hi,
This patch implements the following TODO item:
Allow COPY in CSV mode to control whether a quoted zero-length
string is treated as NULL
Currently this is always treated as a zero-length string,
which generates an error when loading into an integer column
Re: [PATCHES] allow CSV quote in NULL
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php
http://wiki.postgresql.org/wiki/Todo#COPY
I had a very definite use-case for this functionality recently while importing
CSV files generated by Oracle, and was somewhat frustrated by the existence
of a FORCE_NOT_NULL option for specific columns, but not one for
FORCE_NULL.
I'll add this to the November commitfest.
Regards
Ian Barwick
Attachments:
copy-force-null-v1.patchapplication/octet-stream; name=copy-force-null-v1.patchDownload+178-25
On Sun, Sep 29, 2013 at 1:39 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
Hi,
This patch implements the following TODO item:
Allow COPY in CSV mode to control whether a quoted zero-length
string is treated as NULLCurrently this is always treated as a zero-length string,
which generates an error when loading into an integer columnRe: [PATCHES] allow CSV quote in NULL
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.phphttp://wiki.postgresql.org/wiki/Todo#COPY
I had a very definite use-case for this functionality recently while importing
CSV files generated by Oracle, and was somewhat frustrated by the existence
of a FORCE_NOT_NULL option for specific columns, but not one for
FORCE_NULL.
While going through documentation of this patch to understand it's
usage, I found a small mistake.
+ Force the specified columns' values to be converted to <literal>NULL</>
+ if the value contains an empty string.
It seems quote after columns is wrong.
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:
postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
50,
\.
postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 5, 2013 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Sep 29, 2013 at 1:39 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
Hi,
This patch implements the following TODO item:
Allow COPY in CSV mode to control whether a quoted zero-length
string is treated as NULLCurrently this is always treated as a zero-length string,
which generates an error when loading into an integer columnRe: [PATCHES] allow CSV quote in NULL
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.phphttp://wiki.postgresql.org/wiki/Todo#COPY
I had a very definite use-case for this functionality recently while importing
CSV files generated by Oracle, and was somewhat frustrated by the existence
of a FORCE_NOT_NULL option for specific columns, but not one for
FORCE_NULL.While going through documentation of this patch to understand it's
usage, I found a small mistake.+ Force the specified columns' values to be converted to <literal>NULL</> + if the value contains an empty string.It seems quote after columns is wrong.
That's a correct plural possessive in English, but in might be better
worded as "Force any empty string encountered in the input for the
specified columns to be interpreted as a NULL value."
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)
Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.
Isn't the point of this option to allow a *quoted* empty string to be
forced to NULL? If so, this is not testing the same case - in fact the
COPY command above just makes explicit the default CSV NULL setting anyway.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be forced
to NULL? If so, this is not testing the same case - in fact the COPY command
above just makes explicit the default CSV NULL setting anyway.
I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.
By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be forced
to NULL? If so, this is not testing the same case - in fact the COPY command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.
Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch
allows you to specify the behaviour very explicitly on a per column
basis. I can well imagine wanting to be able to force a quoted empty
string to null for numeric fields but not for text.
The basic principal of our CSV processing is that we don't ever turn a
NULL into something quoted and we don't ever turn something quoted into
NULL. That's what lets us round-trip test just about every combination
of options. I'm only going to be happy violating that, as this patch
does, in a very explicit and controlled way.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 8, 2013 at 8:33 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.
Andrew, are you planning to review & commit this?
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/08/2013 12:30 PM, Robert Haas wrote:
On Tue, Oct 8, 2013 at 8:33 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Andrew, are you planning to review & commit this?
Yes.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/08/2013 12:47 PM, Andrew Dunstan wrote:
On 10/08/2013 12:30 PM, Robert Haas wrote:
Andrew, are you planning to review & commit this?
Yes.
Incidentally, the patch doesn't seem to add the option to file_fdw,
which I really think it should.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/10/9 Andrew Dunstan <andrew@dunslane.net>:
On 10/08/2013 12:47 PM, Andrew Dunstan wrote:
On 10/08/2013 12:30 PM, Robert Haas wrote:
Andrew, are you planning to review & commit this?
Yes.
Incidentally, the patch doesn't seem to add the option to file_fdw, which I
really think it should.
Patch author here. I'll dig out the use-case I had for this patch and have a
look at the file_fdw option, which never occurred to me. (I'm
doing some FDW-related stuff at the moment so would be more than happy
to handle that too).
(Sorry for the delay in following up on this, I kind of assumed the patch
would be squirrelled away until the next commitfest ;) )
Regards
Ian Barwick
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.
The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.
Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?
It seems from documentation that current option FORCE_NOT_NULL handles
for both (Do not match the specified columns' values against the null
string. In the default case where the null string is empty, this means
that empty values will be read as zero-length strings rather than
nulls, "even when they are not quoted.").
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.
Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.
The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?
I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.
I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 09:22 AM, Amit Kapila wrote:
On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.
I really don't know what you're saying here.
Here is the situation we have today (assuming the default null marker of
empty-string):
default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring, quoted-empty-string
-> emptystring
and the proposal would add to that:
with force-null: empty-string -> null, quoted-empty-string -> null
So it appears to be quite on all fours with the way force_not_null works
now, it just does the reverse.
I don't see at all that your suggested alternative has any advantages
over what's been written. If you can say "NULL FOR (foo) as '""' how
will you specify the null for some other column(s)? Are we going to have
multiple such clauses? It looks like a real mess.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
I don't see at all that your suggested alternative has any advantages over
what's been written. If you can say "NULL FOR (foo) as '""' how will you
specify the null for some other column(s)? Are we going to have multiple
such clauses? It looks like a real mess.
Basically the CSV files don't have out-of-band NULLs and it's then a
real mess. In the new pgloader version I've been adding per-column NULL
processing, where NULL can be either an empty string, any number of
space characters or any constant string such as "\N" or "****".
I first added a global per-file NULL representation setting, but that's
not flexible enough to make any sense really. The files we have to
import are way to "creative" in their formats.
In my view, we can slowly deprecate pgloader by including such features
in the core code or make pgloader and the like non-optional parts of
external data loading tool chain.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 11:23 AM, Dimitri Fontaine wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I don't see at all that your suggested alternative has any advantages over
what's been written. If you can say "NULL FOR (foo) as '""' how will you
specify the null for some other column(s)? Are we going to have multiple
such clauses? It looks like a real mess.Basically the CSV files don't have out-of-band NULLs and it's then a
real mess. In the new pgloader version I've been adding per-column NULL
processing, where NULL can be either an empty string, any number of
space characters or any constant string such as "\N" or "****".I first added a global per-file NULL representation setting, but that's
not flexible enough to make any sense really. The files we have to
import are way to "creative" in their formats.In my view, we can slowly deprecate pgloader by including such features
in the core code or make pgloader and the like non-optional parts of
external data loading tool chain.
The CSV code was somewhat controversial when adopted, and was never
intended to cater for all cases. I think it was accepted because it gave
good coverage of a large number of common cases without huge additional
code complexity. I think we drew the line in about the right place for
what we support, although we've extended it modestly over the years. I
seriously doubt that it will ever fully replace a utility like pgloader,
and I'm not sure that's a desirable goal in the first place.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 09, 2013 at 09:52:29AM -0400, Andrew Dunstan wrote:
On 10/09/2013 09:22 AM, Amit Kapila wrote:
On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/07/2013 11:34 PM, Amit Kapila wrote:
On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:On 10/07/2013 03:06 PM, Robert Haas wrote:
Also if your use case is to treat empty strings as NULL (as per above
documentation), can't it be handled with "WITH NULL AS" option.
For example, something like:postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.50,
\.postgres=# select * from testnull;
a | b
----+------
50 | NULL
(1 row)Good point. If this patch is just implementing something that can
already be done with another syntax, we don't need it.Isn't the point of this option to allow a *quoted* empty string to be
forced
to NULL? If so, this is not testing the same case - in fact the COPY
command
above just makes explicit the default CSV NULL setting anyway.I am really not sure if all the purpose of patch can be achieved by
existing syntax, neither it is explained clearly.
However the proposal hasn't discussed why it's not good idea to extend
some similar syntax "COPY .. NULL" which is used to replace string
with NULL's?
Description of NULL says: "Specifies the string that represents a null
value."
Now why can't this syntax be extended to support quoted empty string
if it's not supported currently?
I have not checked completely, If it's difficult or not possible to
support in existing syntax, then even it add's more value to introduce
new syntax.By asking above question, I doesn't mean that we should not go for the
new proposed syntax, rather it's to know and understand the benefit of
new syntax, also it helps during CF review for reviewer's if the
proposal involves new syntax and that's discussed previously.Quite apart from any other consideration, this suggestion is inferior to
what's proposed in that it's an all or nothing deal, while the patch allows
you to specify the behaviour very explicitly on a per column basis. I can
well imagine wanting to be able to force a quoted empty string to null for
numeric fields but not for text.Okay, but can't it be done by extending current syntax such as
NULL FOR <col1, col2> AS ""- which would mean it will replace
corresponding column's values as NULL if they contain empty string.The basic principal of our CSV processing is that we don't ever turn a NULL
into something quoted and we don't ever turn something quoted into NULL.
That's what lets us round-trip test just about every combination of options.
I'm only going to be happy violating that, as this patch does, in a very
explicit and controlled way.Will this option allow only quoted empty string to be NULL or will
handle without quoted empty string as well?I had checked patch and it seems for both quoted and unquoted empty
string, it will replace it with NULL.
Now here it's bit different from FORCE_NOT_NULL, because already
for un-quoted empty strings we have a way to replace them with NULL.I think having 2 different syntax for replacing empty strings (one
for quoted and another for un-quoted) as NULL's might not be best way
to accomplish
this feature.I really don't know what you're saying here.
Here is the situation we have today (assuming the default null
marker of empty-string):default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring,
quoted-empty-string -> emptystringand the proposal would add to that:
with force-null: empty-string -> null, quoted-empty-string -> null
So it appears to be quite on all fours with the way force_not_null
works now, it just does the reverse.I don't see at all that your suggested alternative has any
advantages over what's been written. If you can say "NULL FOR (foo)
as '""' how will you specify the null for some other column(s)?
Idea:
NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,
Are we going to have multiple such clauses? It looks like a real
mess.
Is that not part of what parsers ordinarily do?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 01:25 PM, David Fetter wrote:
Idea:
NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,
What's the point of this? How is this superior to what is currently
proposed? Having arbitrary NULL markers for different fields will
significantly increase code complexity for a case I have a hard time
believing exists to any significant degree in the real world. The ONLY
case I know of outside some fervid imaginations where we need to
distinguish between different NULL treatment is quoted vs unquoted,
which this patch rounds out. Catering for anything else seems quite
unnecessary.
What is more, I think it's actually far more obscure than what is
proposed, which fits in nicely with our existing pattern of options.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 9, 2013 at 9:52 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
I really don't know what you're saying here.
Here is the situation we have today (assuming the default null marker of
empty-string):default: empty-string -> null, quoted-empty-string ->
emptystring
with force_not_null: empty-string -> emptystring, quoted-empty-string ->
emptystringand the proposal would add to that:
with force-null: empty-string -> null, quoted-empty-string -> null
So it appears to be quite on all fours with the way force_not_null works
now, it just does the reverse.
That principle seems sound as far as it goes, but somehow I feel we've
dug ourselves into a hole here naming-wise. Apparently an unquoted
empty string is normally null, but you can use force-not-null to make
it an empty string instead. And a quoted empty string is normally an
empty string, but you can use force-null to make it a null instead.
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.
An unsuspecting user confronted with a column marked with both of
those options at the same time might be a bit perplexed.
It seems to me that it might be better to have an option called
empty_input, and then you could have four values with names we can
bikeshed about - e.g. auto (the current default behavior), null
(proposed force null), empty_string (current force not null), reversed
(force null + force not null).
(In the interest of full disclosure, there is an EDB product that as
of recently offers something very much like this, with slightly
different naming and offering only the first three of those four
options, motivated by a customer complaint about the default behavior,
which was the same as COPY's default behavior. I don't know that the
solution we adopted there has any bearing on what ought to be done
here, and I can certainly live with it if people prefer to have FORCE
NULL and FORCE NOT NULL with not-quite-opposite meanings, but I do
think it's sort of hilariously awful, right up there with constraint
exclusion vs. exclusion constraints.)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 03:25 PM, Robert Haas wrote:
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.
Is there a real world example of this case? How common is it? And how
come I haven't heard of it in the nine or so years since we've been
supporting CSV import?
Frankly it strikes me as more than a little perverse to have a CSV file
where you want an unquoted empty string to map to emptystring but a
quoted one to map to null, especially if you want that for the same
field. The main two problems I have heard people people complain about are:
* some CSV producers quote everything, even NULL values. This is
particularly true where the producer doesn't treat NULLs the same
way we do. This patch will fix that, allowing you to get a null on a
column by column basis as desired.
* some tables have NOT NULL constraints that are violated by unquoted
empty strings being interpreted as NULL - that was fixed long ago by
FORCE NOT NULL.
It's faintly possible you might encounter both of these problems in some
form in the one file, but not very likely.
Regarding syntax suggestions - this is not a green field. If we were
designing the syntax of COPY from scratch today we might make other
decisions than those that were made back in 2004. But I don't think we
can add new options using a quite different style from what's already
been done. That would look more than odd.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 9, 2013 at 5:43 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/09/2013 03:25 PM, Robert Haas wrote:
Therefore, a user who wants the opposite of the default behavior -
namely, unquoted empty strings as empty strings and quoted empty
strings as nulls - should specify both FORCE NULL and FORCE NOT NULL.Is there a real world example of this case? How common is it? And how come I
haven't heard of it in the nine or so years since we've been supporting CSV
import?
I doubt there is any real world use case. My point is just that FORCE
NULL and FORCE NOT NULL are not opposites, and that will probably
confuse some people.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers