Copy out wording

Started by Magnus Haganderover 16 years ago10 messages
#1Magnus Hagander
magnus@hagander.net

Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

If not, then what really is the difference between a NULL and a NULL string?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#2Quan Zongliang
quanzongliang@gmail.com
In reply to: Magnus Hagander (#1)
Re: Copy out wording

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

sure, the document seems to has mistake.

and i parse csv format with same way in import function

-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com

#3Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#1)
Fwd: Copy out wording

Crap, I just realized I sent to pgadmin hackers by mystake. Meh.

Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
 The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

If not, then what really is the difference between a NULL and a NULL string?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#3)
Re: Fwd: Copy out wording

Magnus Hagander wrote:

Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

If not, then what really is the difference between a NULL and a NULL string?

No, it shouldn't. Let's say NULL is represented as "foo". Then a null
between delimiters will be written as

delimiter foo delimiter

while the string "foo" will be

delimiter quotechar foo quotechar delimiter

and an empty non-null string will be

delimiter delimiter

unless you have FORCE QUOTE on for it, in which case it will be

delimiter quotechar quotechar delimiter

We had quite a bit of debate on the shape of CSV output at the time it
was done (during 8.0), and that's what we came up with. It has the
useful property that we can round-trip the data, i.e. we can read back
the data we output without losing information about nulls, no matter
what the NULL string is, something we have always been resistant to
changing.

If you think we could explain it better, by all means have a go at it.
But your proposed change isn't accurate. Here is an illustration of the
above:

andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to
stdout null 'foo' csv header;
column1,column2,column3
1,"foo",2
3,foo,4
5,,6
andrew=# copy (values (1, 'foo', 2),(3,null,4),(5,'',6 ) ) to
stdout null 'foo' csv header force quote column2;
column1,column2,column3
1,"foo",2
3,foo,4
5,"",6

HTH

cheers

andrew

#5Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#4)
Re: Fwd: Copy out wording

On Thu, Sep 3, 2009 at 13:19, Andrew Dunstan<andrew@dunslane.net> wrote:

Magnus Hagander wrote:

Our documentation for COPY
(http://www.postgresql.org/docs/8.4/static/sql-copy.html) has the
following to say:
"
 The CSV format has no standard way to distinguish a NULL value from
an empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted. Therefore, using the default
settings, a NULL is written as an unquoted empty string, while an
empty string is written with double quotes (""). Reading values
follows similar rules. You can use FORCE NOT NULL to prevent NULL
input comparisons for specific columns.
"

Shouldn't that be:
"A NULL is output as the NULL string and is not quoted, while a data
value matching the empty string is quoted"?

If not, then what really is the difference between a NULL and a NULL
string?

No, it shouldn't. Let's say NULL is represented as "foo". Then a null
between delimiters will be written as

  delimiter foo delimiter

while the string "foo" will be

  delimiter quotechar foo quotechar delimiter

and an empty non-null string will be

  delimiter delimiter

unless you have FORCE QUOTE on for it, in which case it will be

  delimiter quotechar quotechar delimiter

We had quite a bit of debate on the shape of CSV output at the time it was
done (during 8.0), and that's what we came up with. It has the useful
property that we can round-trip the data, i.e. we can read back the data we
output without losing information about nulls, no matter what the NULL
string is, something we have always been resistant to changing.

If you think we could explain it better, by all means have a go at it. But
your proposed change isn't accurate. Here is an illustration of the above:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

(FWIW, I totally agree with the feature, I was just confused by the docs)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#5)
Re: Fwd: Copy out wording

Magnus Hagander wrote:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

We could change:

A NULL is output as the NULL string and is not quoted, while a data value
matching the NULL string is quoted.

to

A NULL is output as the NULL parameter and is not quoted, while a non-NULL data value whose text representation
matches the NULL parameter is quoted.

or something similar. Would that be better?

cheers

andrew

#7Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#6)
Re: Fwd: Copy out wording

On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:

Magnus Hagander wrote:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

We could change:

  A NULL is output as the NULL string and is not quoted, while a data value
  matching the NULL string is quoted.

to

  A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
  matches the NULL parameter is quoted.

or something similar. Would that be better?

Yes, much better IMO.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#8Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#7)
1 attachment(s)
Re: Fwd: Copy out wording

Magnus Hagander wrote:

On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:

Magnus Hagander wrote:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

We could change:

? A NULL is output as the NULL string and is not quoted, while a data value
? matching the NULL string is quoted.

to

? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
? matches the NULL parameter is quoted.

or something similar. Would that be better?

Yes, much better IMO.

I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.87
diff -c -c -r1.87 copy.sgml
*** doc/src/sgml/ref/copy.sgml	5 Sep 2009 23:58:01 -0000	1.87
--- doc/src/sgml/ref/copy.sgml	17 Sep 2009 21:06:59 -0000
***************
*** 550,562 ****
     <para>
      The <literal>CSV</> format has no standard way to distinguish a
      <literal>NULL</> value from an empty string.
!     <productname>PostgreSQL</>'s <command>COPY</> handles this by
!     quoting. A <literal>NULL</> is output as the <literal>NULL</>
!     string and is not quoted, while a data value matching the
!     <literal>NULL</> string is quoted. Therefore, using the default
!     settings, a <literal>NULL</> is written as an unquoted empty
!     string, while an empty string is written with double quotes
!     (<literal>""</>). Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.
     </para>
--- 550,559 ----
     <para>
      The <literal>CSV</> format has no standard way to distinguish a
      <literal>NULL</> value from an empty string.
!     <productname>PostgreSQL</>'s <command>COPY</> handles this using
!     quoting. A <literal>NULL</> is output as an empty string without
!     quotes, while an empty string data value is double-quoted
!     (<literal>""</>).  Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.
     </para>
#9Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#8)
Re: Fwd: Copy out wording

Bruce Momjian wrote:

Magnus Hagander wrote:

On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:

Magnus Hagander wrote:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

We could change:

? A NULL is output as the NULL string and is not quoted, while a data value
? matching the NULL string is quoted.

to

? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
? matches the NULL parameter is quoted.

or something similar. Would that be better?

Yes, much better IMO.

I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.

Why didn't you follow the wording I actually suggested, which had the
virtue of being accurate.

cheers

andrew

#10Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#9)
1 attachment(s)
Re: Fwd: Copy out wording

Andrew Dunstan wrote:

Bruce Momjian wrote:

Magnus Hagander wrote:

On Thu, Sep 3, 2009 at 13:31, Andrew Dunstan<andrew@dunslane.net> wrote:

Magnus Hagander wrote:

Oh, hang on, "the NULL string" refers to the copy parameter? Not a
part of the data? I read it as "a string being NULL". Maybe something
like "the value of the NULL string parameter" to be overly clear for
people like me? :-)

We could change:

? A NULL is output as the NULL string and is not quoted, while a data value
? matching the NULL string is quoted.

to

? A NULL is output as the NULL parameter and is not quoted, while a non-NULL
data value whose text representation
? matches the NULL parameter is quoted.

or something similar. Would that be better?

Yes, much better IMO.

I have applied the attached documentation clarification patch, and
backpatched it to 8.4.X.

Why didn't you follow the wording I actually suggested, which had the
virtue of being accurate.

I thought the problem was the use of the word "null string", which
clearly was confusing.

I have updated the docs to use your wording, with a little
clarification. The diff against yesterday's CVS is attached, which is
not smaller.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/diff3text/x-diffDownload
Index: copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.87
diff -c -r1.87 copy.sgml
*** copy.sgml	5 Sep 2009 23:58:01 -0000	1.87
--- copy.sgml	17 Sep 2009 21:47:11 -0000
***************
*** 1,5 ****
  <!--
! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.87 2009/09/05 23:58:01 tgl Exp $
  PostgreSQL documentation
  -->
  
--- 1,5 ----
  <!--
! $PostgreSQL: pgsql/doc/src/sgml/ref/copy.sgml,v 1.88 2009/09/17 21:13:01 momjian Exp $
  PostgreSQL documentation
  -->
  
***************
*** 552,561 ****
      <literal>NULL</> value from an empty string.
      <productname>PostgreSQL</>'s <command>COPY</> handles this by
      quoting. A <literal>NULL</> is output as the <literal>NULL</>
!     string and is not quoted, while a data value matching the
!     <literal>NULL</> string is quoted. Therefore, using the default
      settings, a <literal>NULL</> is written as an unquoted empty
!     string, while an empty string is written with double quotes
      (<literal>""</>). Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.
--- 552,561 ----
      <literal>NULL</> value from an empty string.
      <productname>PostgreSQL</>'s <command>COPY</> handles this by
      quoting. A <literal>NULL</> is output as the <literal>NULL</>
!     parameter and is not quoted, while a non-NULL value matching the
!     the <literal>NULL</> parameter string is quoted. Therefore, using the default
      settings, a <literal>NULL</> is written as an unquoted empty
!     string, while an empty string data value is written with double quotes
      (<literal>""</>). Reading values follows similar rules. You can
      use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input
      comparisons for specific columns.