COPY WITH CSV FORCE QUOTE *

Started by Itagaki Takahiroover 16 years ago21 messages
#1Itagaki Takahiro
itagaki.takahiro@oss.ntt.co.jp
1 attachment(s)

Hi,

FORCE QUOTE option of COPY WITH CSV requires an explicit column list,
but '*' (all columns) would be also useful for typical usages.

I searched the ML archive and found one request before:
| COPY TO with FORCE QUOTE *
| http://archives.postgresql.org/pgsql-sql/2008-08/msg00084.php

The attached is a WIP patch add a support of '*' for FORCE QUOTE
and FORCE NOT NULL options. I'd like to submit it for the next
commit fest (8.5). Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

force_quote_all-20090512.patchapplication/octet-stream; name=force_quote_all-20090512.patchDownload
diff -cpr head/src/backend/commands/copy.c force_quote_all/src/backend/commands/copy.c
*** head/src/backend/commands/copy.c	Mon Apr 20 09:35:39 2009
--- force_quote_all/src/backend/commands/copy.c	Tue May 12 13:46:23 2009
*************** DoCopy(const CopyStmt *stmt, const char 
*** 730,735 ****
--- 730,738 ----
  	int			num_phys_attrs;
  	uint64		processed;
  
+ 	/* a dummy list that represents 'all-columns' */
+ 	List		all_columns = { T_List };
+ 	
  	/* Allocate workspace and zero all fields */
  	cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
  
*************** DoCopy(const CopyStmt *stmt, const char 
*** 809,814 ****
--- 812,821 ----
  						(errcode(ERRCODE_SYNTAX_ERROR),
  						 errmsg("conflicting or redundant options")));
  			force_quote = (List *) defel->arg;
+ 
+ 			/* NIL means all-columns */
+ 			if (force_quote == NIL)
+ 				force_quote = &all_columns;
  		}
  		else if (strcmp(defel->defname, "force_notnull") == 0)
  		{
*************** DoCopy(const CopyStmt *stmt, const char 
*** 817,822 ****
--- 824,833 ----
  						(errcode(ERRCODE_SYNTAX_ERROR),
  						 errmsg("conflicting or redundant options")));
  			force_notnull = (List *) defel->arg;
+ 
+ 			/* NIL means all-columns */
+ 			if (force_notnull == NIL)
+ 				force_notnull = &all_columns;
  		}
  		else
  			elog(ERROR, "option \"%s\" not recognized",
*************** DoCopy(const CopyStmt *stmt, const char 
*** 1092,1098 ****
  
  	/* Convert FORCE QUOTE name list to per-column flags, check validity */
  	cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_quote)
  	{
  		List	   *attnums;
  		ListCell   *cur;
--- 1103,1116 ----
  
  	/* Convert FORCE QUOTE name list to per-column flags, check validity */
  	cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_quote == &all_columns)
! 	{
! 		int		i;
! 
! 		for (i = 0; i < num_phys_attrs; i++)
! 			cstate->force_quote_flags[i] = true;
! 	}
! 	else if (force_quote)
  	{
  		List	   *attnums;
  		ListCell   *cur;
*************** DoCopy(const CopyStmt *stmt, const char 
*** 1114,1120 ****
  
  	/* Convert FORCE NOT NULL name list to per-column flags, check validity */
  	cstate->force_notnull_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_notnull)
  	{
  		List	   *attnums;
  		ListCell   *cur;
--- 1132,1145 ----
  
  	/* Convert FORCE NOT NULL name list to per-column flags, check validity */
  	cstate->force_notnull_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_notnull == &all_columns)
! 	{
! 		int		i;
! 
! 		for (i = 0; i < num_phys_attrs; i++)
! 			cstate->force_notnull_flags[i] = true;
! 	}
! 	else if (force_notnull)
  	{
  		List	   *attnums;
  		ListCell   *cur;
diff -cpr head/src/backend/parser/gram.y force_quote_all/src/backend/parser/gram.y
*** head/src/backend/parser/gram.y	Thu Apr 30 09:43:10 2009
--- force_quote_all/src/backend/parser/gram.y	Tue May 12 13:46:23 2009
*************** copy_opt_item:
*** 1989,1997 ****
--- 1989,2005 ----
  				{
  					$$ = makeDefElem("force_quote", (Node *)$3);
  				}
+ 			| FORCE QUOTE '*'
+ 				{
+ 					$$ = makeDefElem("force_quote", NULL);
+ 				}
  			| FORCE NOT NULL_P columnList
  				{
  					$$ = makeDefElem("force_notnull", (Node *)$4);
+ 				}
+ 			| FORCE NOT NULL_P '*'
+ 				{
+ 					$$ = makeDefElem("force_notnull", NULL);
  				}
  		;
  
diff -cpr head/src/test/regress/expected/copy2.out force_quote_all/src/test/regress/expected/copy2.out
*** head/src/test/regress/expected/copy2.out	Tue Oct 10 16:16:01 2006
--- force_quote_all/src/test/regress/expected/copy2.out	Tue May 12 13:46:23 2009
*************** COPY y TO stdout WITH CSV FORCE QUOTE co
*** 191,196 ****
--- 191,200 ----
  "Jackson, Sam","\\h"
  "It is \"perfect\".","	"
  "",
+ COPY y TO stdout WITH CSV FORCE QUOTE *;
+ "Jackson, Sam","\h"
+ "It is ""perfect"".","	"
+ "",
  --test that we read consecutive LFs properly
  CREATE TEMP TABLE testnl (a int, b text, c int);
  COPY testnl FROM stdin CSV;
diff -cpr head/src/test/regress/sql/copy2.sql force_quote_all/src/test/regress/sql/copy2.sql
*** head/src/test/regress/sql/copy2.sql	Tue Oct 10 16:16:01 2006
--- force_quote_all/src/test/regress/sql/copy2.sql	Tue May 12 13:46:23 2009
*************** INSERT INTO y VALUES ('', NULL);
*** 128,133 ****
--- 128,134 ----
  COPY y TO stdout WITH CSV;
  COPY y TO stdout WITH CSV QUOTE '''' DELIMITER '|';
  COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\';
+ COPY y TO stdout WITH CSV FORCE QUOTE *;
  
  --test that we read consecutive LFs properly
  
#2Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Itagaki Takahiro (#1)
Re: COPY WITH CSV FORCE QUOTE *

On Mon, May 11, 2009 at 11:49 PM, Itagaki
Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote:

Hi,

The attached is a WIP patch add a support of '*' for FORCE QUOTE
and FORCE NOT NULL options. I'd like to submit it for the next
commit fest (8.5). Comments welcome.

this patch applies almost cleanly except for a hunk in gram.y
about the patch itself, i can find value for FORCE QUOTE * but what's
the use case for FORCE NOT NULL?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#3Itagaki Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Jaime Casanova (#2)
Re: COPY WITH CSV FORCE QUOTE *

Jaime Casanova <jcasanov@systemguards.com.ec> wrote:

i can find value for FORCE QUOTE * but what's
the use case for FORCE NOT NULL?

NULLs are not quoted (to be ,, ) because empty strings are written as "".
It comes from original implementation and not from my patch.
I think we don't need to change the behavior.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#4Itagaki Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Itagaki Takahiro (#3)
1 attachment(s)
Re: COPY WITH CSV FORCE QUOTE *

Jaime Casanova <jcasanov@systemguards.com.ec> wrote:

i can find value for FORCE QUOTE * but what's
the use case for FORCE NOT NULL?

Oh, sorry. I misread your mail.
The patch adds * options "FORCE QUOTE" and "FORCE NOT NULL", too.
Both of * mean all-columns for each options.

The attached is an updated version of patch; just add documenation
to copy.sgml.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

force_quote_all-20090714.patchapplication/octet-stream; name=force_quote_all-20090714.patchDownload
diff -cpr HEAD/doc/src/sgml/ref/copy.sgml force_quote_all/doc/src/sgml/ref/copy.sgml
*** HEAD/doc/src/sgml/ref/copy.sgml	Mon Feb  9 09:25:51 2009
--- force_quote_all/doc/src/sgml/ref/copy.sgml	Tue Jul 14 16:23:12 2009
*************** COPY <replaceable class="parameter">tabl
*** 32,38 ****
            [ CSV [ HEADER ]
                  [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] 
                  [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
!                 [ FORCE NOT NULL <replaceable class="parameter">column</replaceable> [, ...] ]
  
  COPY { <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable> [, ...] ) ] | ( <replaceable class="parameter">query</replaceable> ) }
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
--- 32,38 ----
            [ CSV [ HEADER ]
                  [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] 
                  [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
!                 [ FORCE NOT NULL { <replaceable class="parameter">column</replaceable> [, ...] | * } ]
  
  COPY { <replaceable class="parameter">tablename</replaceable> [ ( <replaceable class="parameter">column</replaceable> [, ...] ) ] | ( <replaceable class="parameter">query</replaceable> ) }
      TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }
*************** COPY { <replaceable class="parameter">ta
*** 44,50 ****
            [ CSV [ HEADER ]
                  [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] 
                  [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
!                 [ FORCE QUOTE <replaceable class="parameter">column</replaceable> [, ...] ]
  </synopsis>
   </refsynopsisdiv>
   
--- 44,50 ----
            [ CSV [ HEADER ]
                  [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] 
                  [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
!                 [ FORCE QUOTE { <replaceable class="parameter">column</replaceable> [, ...] | * } ]
  </synopsis>
   </refsynopsisdiv>
   
*************** COPY { <replaceable class="parameter">ta
*** 248,254 ****
       <para>
        In <literal>CSV</> <command>COPY TO</> mode, forces quoting to be
        used for all non-<literal>NULL</> values in each specified column.
!       <literal>NULL</> output is never quoted.
       </para>
      </listitem>
     </varlistentry>
--- 248,255 ----
       <para>
        In <literal>CSV</> <command>COPY TO</> mode, forces quoting to be
        used for all non-<literal>NULL</> values in each specified column.
!       <literal>NULL</> output is never quoted. If * is specified, all
!       columns of the table will be quoted.
       </para>
      </listitem>
     </varlistentry>
*************** COPY { <replaceable class="parameter">ta
*** 261,267 ****
        specified column as though it were quoted and hence not a
        <literal>NULL</> value. For the default null string in
        <literal>CSV</> mode (<literal>''</>), this causes missing
!       values to be input as zero-length strings.
       </para>
      </listitem>
     </varlistentry>
--- 262,269 ----
        specified column as though it were quoted and hence not a
        <literal>NULL</> value. For the default null string in
        <literal>CSV</> mode (<literal>''</>), this causes missing
!       values to be input as zero-length strings. If * is specified,
!       all columns of the table will be not nulls.
       </para>
      </listitem>
     </varlistentry>
diff -cpr HEAD/src/backend/commands/copy.c force_quote_all/src/backend/commands/copy.c
*** HEAD/src/backend/commands/copy.c	Fri Jun 12 09:52:43 2009
--- force_quote_all/src/backend/commands/copy.c	Tue Jul 14 16:23:12 2009
*************** DoCopy(const CopyStmt *stmt, const char 
*** 730,735 ****
--- 730,738 ----
  	int			num_phys_attrs;
  	uint64		processed;
  
+ 	/* a dummy list that represents 'all-columns' */
+ 	List		all_columns = { T_List };
+ 	
  	/* Allocate workspace and zero all fields */
  	cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
  
*************** DoCopy(const CopyStmt *stmt, const char 
*** 809,814 ****
--- 812,821 ----
  						(errcode(ERRCODE_SYNTAX_ERROR),
  						 errmsg("conflicting or redundant options")));
  			force_quote = (List *) defel->arg;
+ 
+ 			/* NIL means all-columns */
+ 			if (force_quote == NIL)
+ 				force_quote = &all_columns;
  		}
  		else if (strcmp(defel->defname, "force_notnull") == 0)
  		{
*************** DoCopy(const CopyStmt *stmt, const char 
*** 817,822 ****
--- 824,833 ----
  						(errcode(ERRCODE_SYNTAX_ERROR),
  						 errmsg("conflicting or redundant options")));
  			force_notnull = (List *) defel->arg;
+ 
+ 			/* NIL means all-columns */
+ 			if (force_notnull == NIL)
+ 				force_notnull = &all_columns;
  		}
  		else
  			elog(ERROR, "option \"%s\" not recognized",
*************** DoCopy(const CopyStmt *stmt, const char 
*** 1092,1098 ****
  
  	/* Convert FORCE QUOTE name list to per-column flags, check validity */
  	cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_quote)
  	{
  		List	   *attnums;
  		ListCell   *cur;
--- 1103,1116 ----
  
  	/* Convert FORCE QUOTE name list to per-column flags, check validity */
  	cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_quote == &all_columns)
! 	{
! 		int		i;
! 
! 		for (i = 0; i < num_phys_attrs; i++)
! 			cstate->force_quote_flags[i] = true;
! 	}
! 	else if (force_quote)
  	{
  		List	   *attnums;
  		ListCell   *cur;
*************** DoCopy(const CopyStmt *stmt, const char 
*** 1114,1120 ****
  
  	/* Convert FORCE NOT NULL name list to per-column flags, check validity */
  	cstate->force_notnull_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_notnull)
  	{
  		List	   *attnums;
  		ListCell   *cur;
--- 1132,1145 ----
  
  	/* Convert FORCE NOT NULL name list to per-column flags, check validity */
  	cstate->force_notnull_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_notnull == &all_columns)
! 	{
! 		int		i;
! 
! 		for (i = 0; i < num_phys_attrs; i++)
! 			cstate->force_notnull_flags[i] = true;
! 	}
! 	else if (force_notnull)
  	{
  		List	   *attnums;
  		ListCell   *cur;
diff -cpr HEAD/src/backend/parser/gram.y force_quote_all/src/backend/parser/gram.y
*** HEAD/src/backend/parser/gram.y	Tue Jul 14 09:36:59 2009
--- force_quote_all/src/backend/parser/gram.y	Tue Jul 14 16:23:12 2009
*************** copy_opt_item:
*** 1995,2003 ****
--- 1995,2011 ----
  				{
  					$$ = makeDefElem("force_quote", (Node *)$3);
  				}
+ 			| FORCE QUOTE '*'
+ 				{
+ 					$$ = makeDefElem("force_quote", NULL);
+ 				}
  			| FORCE NOT NULL_P columnList
  				{
  					$$ = makeDefElem("force_notnull", (Node *)$4);
+ 				}
+ 			| FORCE NOT NULL_P '*'
+ 				{
+ 					$$ = makeDefElem("force_notnull", NULL);
  				}
  		;
  
diff -cpr HEAD/src/test/regress/expected/copy2.out force_quote_all/src/test/regress/expected/copy2.out
*** HEAD/src/test/regress/expected/copy2.out	Mon Jun 15 09:50:34 2009
--- force_quote_all/src/test/regress/expected/copy2.out	Tue Jul 14 16:23:12 2009
*************** COPY y TO stdout WITH CSV FORCE QUOTE co
*** 191,196 ****
--- 191,200 ----
  "Jackson, Sam","\\h"
  "It is \"perfect\".","	"
  "",
+ COPY y TO stdout WITH CSV FORCE QUOTE *;
+ "Jackson, Sam","\h"
+ "It is ""perfect"".","	"
+ "",
  --test that we read consecutive LFs properly
  CREATE TEMP TABLE testnl (a int, b text, c int);
  COPY testnl FROM stdin CSV;
diff -cpr HEAD/src/test/regress/sql/copy2.sql force_quote_all/src/test/regress/sql/copy2.sql
*** HEAD/src/test/regress/sql/copy2.sql	Mon Jun 15 09:50:34 2009
--- force_quote_all/src/test/regress/sql/copy2.sql	Tue Jul 14 16:23:12 2009
*************** INSERT INTO y VALUES ('', NULL);
*** 128,133 ****
--- 128,134 ----
  COPY y TO stdout WITH CSV;
  COPY y TO stdout WITH CSV QUOTE '''' DELIMITER '|';
  COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\';
+ COPY y TO stdout WITH CSV FORCE QUOTE *;
  
  --test that we read consecutive LFs properly
  
#5Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Itagaki Takahiro (#4)
Re: COPY WITH CSV FORCE QUOTE *

On Tue, Jul 14, 2009 at 2:26 AM, Itagaki
Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote:

Jaime Casanova <jcasanov@systemguards.com.ec> wrote:

i can find value for FORCE QUOTE * but what's
the use case for FORCE NOT NULL?

Oh, sorry. I misread your mail.
The patch adds * options "FORCE QUOTE" and "FORCE NOT NULL", too.
Both of * mean all-columns for each options.

and i misunderstood your patch... is actually an extend of an existing
functionality, and both options are necesary for consistency...
do you hear an echo here? ;)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Itagaki Takahiro (#4)
Re: COPY WITH CSV FORCE QUOTE *

Itagaki Takahiro wrote:

Jaime Casanova <jcasanov@systemguards.com.ec> wrote:

i can find value for FORCE QUOTE * but what's
the use case for FORCE NOT NULL?

Oh, sorry. I misread your mail.
The patch adds * options "FORCE QUOTE" and "FORCE NOT NULL", too.
Both of * mean all-columns for each options.

I still don't understand the use case for FORCE NOT NULL *.

FORCE QUOTE * is relatively benign. In particular, it doesn't affect the
null-preserving properties of our CSV implementation, since it still
won't (or shouldn't) quote null values.

FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't
work for a column of any type that doesn't accept an empty string as
valid input, such as numeric types.

I note that this patch was (I think) originally submitted without prior
discussion. That's not following best practice.

cheers

andrew

#7Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#6)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

All,

1) Patch applies cleanly against CVS head.

2) Patch compiles and builds cleanly.

3) Unable to check docs because of general doc build problems.

4) Tested the following commands, using a 10MB table of PostgreSQL log data:

postgres=# COPY marchlog TO '/tmp/marchlog1.csv' with csv header;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog2.csv' with csv header force
quote *;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog3.csv' with csv header force
quote process_id;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog4.csv' with csv force quote *;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog5.csv' with force quote *;
ERROR: COPY force quote available only in CSV mode
STATEMENT: COPY marchlog TO '/tmp/marchlog5.csv' with force quote *;
ERROR: COPY force quote available only in CSV mode

postgres=# COPY reloadlog FROM '/tmp/marchlog2.csv' with csv header;
COPY 81097

postgres-# \copy marchlog TO '/tmp/marchlog5.csv' with csv force quote *;
postgres-#

Per discussion, I did not test FORCE QUOTE NOT NULL *.

All output looked as expected. This patch did not seem to change
eariler functionality, and seems to quote as specified.

Unless there are other things we want to test (CLOBs?) I think the patch
is probably ready for code review of the FORCE QUOTE * portion.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#8Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#6)
Re: COPY WITH CSV FORCE QUOTE *

Andrew,

FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't
work for a column of any type that doesn't accept an empty string as
valid input, such as numeric types.

Con: this allows COPY to produce output which cannot be reloaded into
PostgreSQL.

Pro: there is a lot of extremely broken external software which expects
"nulls" to be expressed as "". This improves compatiblity with them.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#8)
Re: COPY WITH CSV FORCE QUOTE *

Josh Berkus wrote:

Andrew,

FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't
work for a column of any type that doesn't accept an empty string as
valid input, such as numeric types.

Con: this allows COPY to produce output which cannot be reloaded into
PostgreSQL.

Pro: there is a lot of extremely broken external software which
expects "nulls" to be expressed as "". This improves compatiblity
with them.

FORCE NOT NULL is only valid when we import data, not when we export
data, so what other programs expect to receive is irrelevant to any
argument about FORCE NOT NULL.

AFAICT on a brief look at the patch, it doesn't affect the quoting of
nulls on export, it just allows * as an alias for all columns for FORCE
QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced
quoting of null values, only non-null values. We have never quoted null
values, and I'm fairly resistant to any suggestion that we should.

As for importing data from programs that produce all values in quotes
including null/missing values (your pro case above), arguably what we
need is another flag that would turn an empty string into a null.

cheers

andrew

#10Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#7)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

On Thu, Jul 16, 2009 at 2:47 PM, Josh Berkus<josh@agliodbs.com> wrote:

Unless there are other things we want to test (CLOBs?) I think the patch is
probably ready for code review of the FORCE QUOTE * portion.

I think perhaps we should ask the patch author to remove the NOT NULL
stuff first?

...Robert

#11Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#9)
Re: COPY WITH CSV FORCE QUOTE *

Andrew,

AFAICT on a brief look at the patch, it doesn't affect the quoting of
nulls on export, it just allows * as an alias for all columns for FORCE
QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced
quoting of null values, only non-null values. We have never quoted null
values, and I'm fairly resistant to any suggestion that we should.

See? That's what happens when I can't build the docs. ;-) (and
there's no previous discussion of the feature).

As for importing data from programs that produce all values in quotes
including null/missing values (your pro case above), arguably what we
need is another flag that would turn an empty string into a null.

ooooh, TODO, please? There's a lot of this out there, and I've had to
build sed into a lot of import routines.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#12Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#10)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

On 7/16/09 12:53 PM, Robert Haas wrote:

On Thu, Jul 16, 2009 at 2:47 PM, Josh Berkus<josh@agliodbs.com> wrote:

Unless there are other things we want to test (CLOBs?) I think the patch is
probably ready for code review of the FORCE QUOTE * portion.

I think perhaps we should ask the patch author to remove the NOT NULL
stuff first?

Yes, current status is "Waiting on Author".

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#13Chris Spotts
rfusca@gmail.com
In reply to: Josh Berkus (#11)
Re: COPY WITH CSV FORCE QUOTE *

Josh Berkus wrote:

Andrew,

AFAICT on a brief look at the patch, it doesn't affect the quoting of
nulls on export, it just allows * as an alias for all columns for FORCE
QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced
quoting of null values, only non-null values. We have never quoted null
values, and I'm fairly resistant to any suggestion that we should.

See? That's what happens when I can't build the docs. ;-) (and
there's no previous discussion of the feature).

As for importing data from programs that produce all values in quotes
including null/missing values (your pro case above), arguably what we
need is another flag that would turn an empty string into a null.

ooooh, TODO, please? There's a lot of this out there, and I've had to
build sed into a lot of import routines.

+1 For that on the TODO, happens all the time...

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Chris Spotts (#13)
Re: COPY WITH CSV FORCE QUOTE *

Chris Spotts wrote:

As for importing data from programs that produce all values in quotes
including null/missing values (your pro case above), arguably what we
need is another flag that would turn an empty string into a null.

ooooh, TODO, please? There's a lot of this out there, and I've had
to build sed into a lot of import routines.

+1 For that on the TODO, happens all the time...

Well, somebody had better suggest a syntax for it, preferably without
adding yet another keyword.

cheers

andrew

#15Itagaki Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Josh Berkus (#12)
1 attachment(s)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

Josh Berkus <josh@agliodbs.com> wrote:

On 7/16/09 12:53 PM, Robert Haas wrote:

I think perhaps we should ask the patch author to remove the NOT NULL
stuff first?

Yes, current status is "Waiting on Author".

OK, I removed "FORCE NOT NULL" stuff from the patch.
The attached patch only adds "FORCE QUOTE *" feature.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

force_quote_all-20090717.patchapplication/octet-stream; name=force_quote_all-20090717.patchDownload
diff -cpr HEAD/doc/src/sgml/ref/copy.sgml force_quote_all/doc/src/sgml/ref/copy.sgml
*** HEAD/doc/src/sgml/ref/copy.sgml	Fri Jul 17 12:49:09 2009
--- force_quote_all/doc/src/sgml/ref/copy.sgml	Fri Jul 17 12:49:46 2009
*************** COPY { <replaceable class="parameter">ta
*** 44,50 ****
            [ CSV [ HEADER ]
                  [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] 
                  [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
!                 [ FORCE QUOTE <replaceable class="parameter">column</replaceable> [, ...] ]
  </synopsis>
   </refsynopsisdiv>
   
--- 44,50 ----
            [ CSV [ HEADER ]
                  [ QUOTE [ AS ] '<replaceable class="parameter">quote</replaceable>' ] 
                  [ ESCAPE [ AS ] '<replaceable class="parameter">escape</replaceable>' ]
!                 [ FORCE QUOTE { <replaceable class="parameter">column</replaceable> [, ...] | * } ]
  </synopsis>
   </refsynopsisdiv>
   
*************** COPY { <replaceable class="parameter">ta
*** 248,254 ****
       <para>
        In <literal>CSV</> <command>COPY TO</> mode, forces quoting to be
        used for all non-<literal>NULL</> values in each specified column.
!       <literal>NULL</> output is never quoted.
       </para>
      </listitem>
     </varlistentry>
--- 248,255 ----
       <para>
        In <literal>CSV</> <command>COPY TO</> mode, forces quoting to be
        used for all non-<literal>NULL</> values in each specified column.
!       <literal>NULL</> output is never quoted. If * is specified, all
!       columns of the table will be quoted.
       </para>
      </listitem>
     </varlistentry>
diff -cpr HEAD/src/backend/commands/copy.c force_quote_all/src/backend/commands/copy.c
*** HEAD/src/backend/commands/copy.c	Fri Jul 17 12:49:09 2009
--- force_quote_all/src/backend/commands/copy.c	Fri Jul 17 12:49:46 2009
*************** DoCopy(const CopyStmt *stmt, const char 
*** 730,735 ****
--- 730,738 ----
  	int			num_phys_attrs;
  	uint64		processed;
  
+ 	/* a dummy list that represents 'all-columns' */
+ 	List		all_columns = { T_List };
+ 	
  	/* Allocate workspace and zero all fields */
  	cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
  
*************** DoCopy(const CopyStmt *stmt, const char 
*** 809,814 ****
--- 812,821 ----
  						(errcode(ERRCODE_SYNTAX_ERROR),
  						 errmsg("conflicting or redundant options")));
  			force_quote = (List *) defel->arg;
+ 
+ 			/* NIL means all-columns */
+ 			if (force_quote == NIL)
+ 				force_quote = &all_columns;
  		}
  		else if (strcmp(defel->defname, "force_notnull") == 0)
  		{
*************** DoCopy(const CopyStmt *stmt, const char 
*** 1092,1098 ****
  
  	/* Convert FORCE QUOTE name list to per-column flags, check validity */
  	cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_quote)
  	{
  		List	   *attnums;
  		ListCell   *cur;
--- 1099,1112 ----
  
  	/* Convert FORCE QUOTE name list to per-column flags, check validity */
  	cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
! 	if (force_quote == &all_columns)
! 	{
! 		int		i;
! 
! 		for (i = 0; i < num_phys_attrs; i++)
! 			cstate->force_quote_flags[i] = true;
! 	}
! 	else if (force_quote)
  	{
  		List	   *attnums;
  		ListCell   *cur;
diff -cpr HEAD/src/backend/parser/gram.y force_quote_all/src/backend/parser/gram.y
*** HEAD/src/backend/parser/gram.y	Fri Jul 17 12:49:09 2009
--- force_quote_all/src/backend/parser/gram.y	Fri Jul 17 12:49:46 2009
*************** copy_opt_item:
*** 2006,2011 ****
--- 2006,2015 ----
  				{
  					$$ = makeDefElem("force_quote", (Node *)$3);
  				}
+ 			| FORCE QUOTE '*'
+ 				{
+ 					$$ = makeDefElem("force_quote", NULL);
+ 				}
  			| FORCE NOT NULL_P columnList
  				{
  					$$ = makeDefElem("force_notnull", (Node *)$4);
diff -cpr HEAD/src/test/regress/expected/copy2.out force_quote_all/src/test/regress/expected/copy2.out
*** HEAD/src/test/regress/expected/copy2.out	Fri Jul 17 12:49:09 2009
--- force_quote_all/src/test/regress/expected/copy2.out	Fri Jul 17 12:49:46 2009
*************** COPY y TO stdout WITH CSV FORCE QUOTE co
*** 191,196 ****
--- 191,200 ----
  "Jackson, Sam","\\h"
  "It is \"perfect\".","	"
  "",
+ COPY y TO stdout WITH CSV FORCE QUOTE *;
+ "Jackson, Sam","\h"
+ "It is ""perfect"".","	"
+ "",
  --test that we read consecutive LFs properly
  CREATE TEMP TABLE testnl (a int, b text, c int);
  COPY testnl FROM stdin CSV;
diff -cpr HEAD/src/test/regress/sql/copy2.sql force_quote_all/src/test/regress/sql/copy2.sql
*** HEAD/src/test/regress/sql/copy2.sql	Fri Jul 17 12:49:09 2009
--- force_quote_all/src/test/regress/sql/copy2.sql	Fri Jul 17 12:49:46 2009
*************** INSERT INTO y VALUES ('', NULL);
*** 128,133 ****
--- 128,134 ----
  COPY y TO stdout WITH CSV;
  COPY y TO stdout WITH CSV QUOTE '''' DELIMITER '|';
  COPY y TO stdout WITH CSV FORCE QUOTE col2 ESCAPE E'\\';
+ COPY y TO stdout WITH CSV FORCE QUOTE *;
  
  --test that we read consecutive LFs properly
  
#16Josh Berkus
josh@agliodbs.com
In reply to: Itagaki Takahiro (#15)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

Itagaki-san,

On Apple OS 10.5:

1. new patch applied cleanly
2. new patch built cleanly
3. regression tests pass
4. Tested following operations:

postgres=# COPY marchlog TO '/tmp/marchlog1.csv' with csv header;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog2.csv' with csv header force
quote *;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog3.csv' with csv header force
quote process_id;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog4.csv' with csv force quote *;
COPY 81097
postgres=# COPY marchlog TO '/tmp/marchlog5.csv' with force quote *;
ERROR: COPY force quote available only in CSV mode
STATEMENT: COPY marchlog TO '/tmp/marchlog5.csv' with force quote *;
ERROR: COPY force quote available only in CSV mode

postgres=# COPY reloadlog FROM '/tmp/marchlog2.csv' with csv header;
COPY 81097

postgres-# \copy marchlog TO '/tmp/marchlog5.csv' with csv force quote *;
postgres-#

5. Regression tests for FORCE QUOTE present.
6. Docs present; not sure how good they are, see prior discussion.

Stuff someone else should do:

a. review code
b. review code format

I am done with this review.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#17Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#14)
Re: COPY WITH CSV FORCE QUOTE *

On 7/16/09 1:55 PM, Andrew Dunstan wrote:

Well, somebody had better suggest a syntax for it, preferably without
adding yet another keyword.

Actually, all that needs to happen is for NULL AS to accept '""' as a
string. Right now that produces:

ERROR: CSV quote character must not appear in the NULL specification

What's the issue with that? I can see how NULL AS '"' would break
things, but if I wanted NULL AS '"Josh"' shouldn't I be able to have it?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#18Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#17)
Re: COPY WITH CSV FORCE QUOTE *

Josh Berkus wrote:

On 7/16/09 1:55 PM, Andrew Dunstan wrote:

Well, somebody had better suggest a syntax for it, preferably without
adding yet another keyword.

Actually, all that needs to happen is for NULL AS to accept '""' as a
string. Right now that produces:

ERROR: CSV quote character must not appear in the NULL specification

What's the issue with that? I can see how NULL AS '"' would break
things, but if I wanted NULL AS '"Josh"' shouldn't I be able to have it?

See previous thread on -patches/-hackers "allow CSV quote in NULL" in
July 2007.

Quite apart from any other objection, doing what you suggest will not be
column-by-column, like what I suggested. It's an all or nothing deal.

cheers

andrew

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#16)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

Josh Berkus wrote:

Stuff someone else should do:

a. review code
b. review code format

I am done with this review.

I have reviewed this and made a small tweak in the docco. I'm just about
ready to commit this, but I'm still slightly worried that passing NULL
to denote all columns in this piece of grammar:

| FORCE QUOTE '*'
{
$$ = makeDefElem("force_quote", NULL);
}

might be less than robust - it just feels slightly hacky, so I'd
appreciate others' thoughts. If nobody else is bothered I will commit
the patch.

cheers

andrew

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#19)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

Andrew Dunstan <andrew@dunslane.net> writes:

I have reviewed this and made a small tweak in the docco. I'm just about
ready to commit this, but I'm still slightly worried that passing NULL
to denote all columns in this piece of grammar:

| FORCE QUOTE '*'
{
$$ = makeDefElem("force_quote", NULL);
}

might be less than robust - it just feels slightly hacky, so I'd
appreciate others' thoughts.

I agree, that's ugly. Why don't you use an A_Star node?

regards, tom lane

#21Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#20)
Re: COPY WITH CSV FORCE QUOTE * -- REVIEW

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I have reviewed this and made a small tweak in the docco. I'm just about
ready to commit this, but I'm still slightly worried that passing NULL
to denote all columns in this piece of grammar:

| FORCE QUOTE '*'
{
$$ = makeDefElem("force_quote", NULL);
}

might be less than robust - it just feels slightly hacky, so I'd
appreciate others' thoughts.

I agree, that's ugly. Why don't you use an A_Star node?

OK, Done and committed. Nice little addition.

cheers

andrew