Copy out wording
Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"
Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?
If not, then what really is the difference between a NULL and a NULL string?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?
sure, the document seems to has mistake.
and i parse csv format with same way in import function
-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
Crap, I just realized I sent to pgadmin hackers by mystake. Meh.
Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"
Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?
If not, then what really is the difference between a NULL and a NULL string?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander wrote:
Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?If not, then what really is the difference between a NULL and a NULL string?
No, it shouldn't. Let's say NULL is represented as "foo". Then a null
between delimiters will be written as
delimiter foo delimiter
while the string "foo" will be
delimiter quotechar foo quotechar delimiter
and an empty non-null string will be
delimiter delimiter
unless you have FORCE QUOTE on for it, in which case it will be
delimiter quotechar quotechar delimiter
We had quite a bit of debate on the shape of CSV output at the time it
was done (during 8.0), and that's what we came up with. It has the
useful property that we can round-trip the data, i.e. we can read back
the data we output without losing information about nulls, no matter
what the NULL string is, something we have always been resistant to
changing.
If you think we could explain it better, by all means have a go at it.
But your proposed change isn't accurate. Here is an illustration of the
above:
andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to
stdout null 'foo' csv header;
column1,column2,column3
1,"foo",2
3,foo,4
5,,6
andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to
stdout null 'foo' csv header force quote column2;
column1,column2,column3
1,"foo",2
3,foo,4
5,"",6
HTH
cheers
andrew
On Thu, Sep 3, 2009 at 13:19, Andrew Dunstan<andrew@dunslane.net> wrote:
Magnus Hagander wrote:
Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?If not, then what really is the difference between a NULL and a NULL
string?No, it shouldn't. Let's say NULL is represented as "foo". Then a null
between delimiters will be written asdelimiter foo delimiter
while the string "foo" will be
delimiter quotechar foo quotechar delimiter
and an empty non-null string will be
delimiter delimiter
unless you have FORCE QUOTE on for it, in which case it will be
delimiter quotechar quotechar delimiter
We had quite a bit of debate on the shape of CSV output at the time it was
done (during 8.0), and that's what we came up with. It has the useful
property that we can round-trip the data, i.e. we can read back the data we
output without losing information about nulls, no matter what the NULL
string is, something we have always been resistant to changing.If you think we could explain it better, by all means have a go at it. But
your proposed change isn't accurate. Here is an illustration of the above:
Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)
(FWIW, I totally agree with the feature, I was just confused by the docs)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander wrote:
Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)
We could change:
A NULL is output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted.
to
A NULL is output as the NULL parameter and is not quoted, while a non-NULL data value whose text representation
matches the NULL parameter is quoted.
or something similar. Would that be better?
cheers
andrew
On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
Magnus Hagander wrote:
Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)We could change:
A NULL is output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted.to
A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
matches the NULL parameter is quoted.or something similar. Would that be better?
Yes, much better IMO.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander wrote:
On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
Magnus Hagander wrote:
Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)We could change:
? A NULL is output as the NULL string and is not quoted, while a data value
? matching the NULL string is quoted.to
? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
? matches the NULL parameter is quoted.or something similar. Would that be better?
Yes, much better IMO.
I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/rtmp/difftext/x-diffDownload
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.87
diff -c -c -r1.87 copy.sgml
*** doc/src/sgml/ref/copy.sgml 5 Sep 2009 23:58:01 -0000 1.87
--- doc/src/sgml/ref/copy.sgml 17 Sep 2009 21:06:59 -0000
***************
*** 550,562 ****
<para>
The <literal>CSV</> format has no standard way to distinguish a
<literal>NULL</> value from an empty string.
! <productname>PostgreSQL</>'s <command>COPY</> handles this by
! quoting. A <literal>NULL</> is output as the <literal>NULL</>
! string and is not quoted, while a data value matching the
! <literal>NULL</> string is quoted. Therefore, using the default
! settings, a <literal>NULL</> is written as an unquoted empty
! string, while an empty string is written with double quotes
! (<literal>""</>). Reading values follows similar rules. You can
use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
comparisons for specific columns.
</para>
--- 550,559 ----
<para>
The <literal>CSV</> format has no standard way to distinguish a
<literal>NULL</> value from an empty string.
! <productname>PostgreSQL</>'s <command>COPY</> handles this using
! quoting. A <literal>NULL</> is output as an empty string without
! quotes, while an empty string data value is double-quoted
! (<literal>""</>). Reading values follows similar rules. You can
use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
comparisons for specific columns.
</para>
Bruce Momjian wrote:
Magnus Hagander wrote:
On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
Magnus Hagander wrote:
Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)We could change:
? A NULL is output as the NULL string and is not quoted, while a data value
? matching the NULL string is quoted.to
? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
? matches the NULL parameter is quoted.or something similar. Would that be better?
Yes, much better IMO.
I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.
Why didn't you follow the wording I actually suggested, which had the
virtue of being accurate.
cheers
andrew
Andrew Dunstan wrote:
Bruce Momjian wrote:
Magnus Hagander wrote:
On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:
Magnus Hagander wrote:
Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)We could change:
? A NULL is output as the NULL string and is not quoted, while a data value
? matching the NULL string is quoted.to
? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
? matches the NULL parameter is quoted.or something similar. Would that be better?
Yes, much better IMO.
I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.Why didn't you follow the wording I actually suggested, which had the
virtue of being accurate.
I thought the problem was the use of the word "null string", which
clearly was confusing.
I have updated the docs to use your wording, with a little
clarification. The diff against yesterday's CVS is attached, which is
not smaller.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/rtmp/diff3text/x-diffDownload
Index: copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.87
diff -c -r1.87 copy.sgml
*** copy.sgml 5 Sep 2009 23:58:01 -0000 1.87
--- copy.sgml 17 Sep 2009 21:47:11 -0000
***************
*** 1,5 ****
<!--
! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.87 2009/09/05 23:58:01 tgl Exp $
PostgreSQL documentation
-->
--- 1,5 ----
<!--
! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.88 2009/09/17 21:13:01 momjian Exp $
PostgreSQL documentation
-->
***************
*** 552,561 ****
<literal>NULL</> value from an empty string.
<productname>PostgreSQL</>'s <command>COPY</> handles this by
quoting. A <literal>NULL</> is output as the <literal>NULL</>
! string and is not quoted, while a data value matching the
! <literal>NULL</> string is quoted. Therefore, using the default
settings, a <literal>NULL</> is written as an unquoted empty
! string, while an empty string is written with double quotes
(<literal>""</>). Reading values follows similar rules. You can
use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
comparisons for specific columns.
--- 552,561 ----
<literal>NULL</> value from an empty string.
<productname>PostgreSQL</>'s <command>COPY</> handles this by
quoting. A <literal>NULL</> is output as the <literal>NULL</>
! parameter and is not quoted, while a non-NULL value matching the
! the <literal>NULL</> parameter string is quoted. Therefore, using the default
settings, a <literal>NULL</> is written as an unquoted empty
! string, while an empty string data value is written with double quotes
(<literal>""</>). Reading values follows similar rules. You can
use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
comparisons for specific columns.