Allow COPY&#39;s &#39;text&#39; format to output a header

michael@paquier.xyz

over 7 years ago

In reply to: David Steele (#2)

Re: Allow COPY's 'text' format to output a header

On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote:

This patch makes sense to me and looks reasonable.

One "potential" problem is if a relation has a full set of column which
allows the input of text-like data: if the header has been added with
COPY TO, and that the user forgets to add again the header option with
COPY FROM, then an extra row will be generated but there is the same
problem with CSV format :)

One comment I have about the patch is that there is no test for
COPY FROM with an output file which has a header. In this case if
HEADER is true then the file can be loaded. If HEADER is wrong, an
error should normally be raised because of the format (well, let's
discard the case of the relation with text-only columns). So the tests
could be extended a bit even for CSV.

We're in the middle of a feature freeze that will last most of the
summer, so be sure to enter your patch into the next commitfest so it
can be considered when the freeze is over.

https://commitfest.postgresql.org/18/

Yes, you will need to be patient a couple of months here.
--
Michael

samullers@gmail.com

over 7 years ago

In reply to: Michael Paquier (#3)

Re: Allow COPY's 'text' format to output a header

Okay, I've added this to the next commitfest at
https://commitfest.postgresql.org/18/1629/.

Thanks both Michael and David for the feedback so far.

On 14 May 2018 at 02:37, Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote:

This patch makes sense to me and looks reasonable.

One "potential" problem is if a relation has a full set of column which
allows the input of text-like data: if the header has been added with
COPY TO, and that the user forgets to add again the header option with
COPY FROM, then an extra row will be generated but there is the same
problem with CSV format :)

One comment I have about the patch is that there is no test for
COPY FROM with an output file which has a header. In this case if
HEADER is true then the file can be loaded. If HEADER is wrong, an
error should normally be raised because of the format (well, let's
discard the case of the relation with text-only columns). So the tests
could be extended a bit even for CSV.

We're in the middle of a feature freeze that will last most of the
summer, so be sure to enter your patch into the next commitfest so it
can be considered when the freeze is over.

https://commitfest.postgresql.org/18/

Yes, you will need to be patient a couple of months here.
--
Michael

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 7 years ago

In reply to: Simon Muller (#4)

Re: Allow COPY's 'text' format to output a header

On 05/14/2018 02:35 AM, Simon Muller wrote:

Okay, I've added this to the next commitfest at
https://commitfest.postgresql.org/18/1629/.

Thanks both Michael and David for the feedback so far.

(Please don't top-post on PostgreSQL lists.)

I'm not necessarily opposed to this, but I'm not certain about the use
case either. The original request seemed to stem from a false impression
that CSV mode can't produce or consume tab-delimited files. But it can,
and in fact it's saner for almost all uses than text format. Postgres'
text format is really intended for Postgres' use. CSV format is more
appropriate for dealing with external programs, whether the delimiter be
a tab or a comma.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Garick Hamlin

ghamlin@isc.upenn.edu

over 7 years ago

In reply to: Michael Paquier (#3)

Re: Allow COPY's 'text' format to output a header

On Mon, May 14, 2018 at 09:37:07AM +0900, Michael Paquier wrote:

On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote:

This patch makes sense to me and looks reasonable.

One "potential" problem is if a relation has a full set of column which
allows the input of text-like data: if the header has been added with
COPY TO, and that the user forgets to add again the header option with
COPY FROM, then an extra row will be generated but there is the same
problem with CSV format :)

Yeah, I wonder if that can be addressed.

I wonder if there was a way to let COPY FROM detect or ignore headers
as appropriate and rather than cause silently result in headers being
added as data.

Maybe a blank line after the header line could prevent this confusion?

Garick

David G. Johnston

david.g.johnston@gmail.com

over 7 years ago

In reply to: Garick Hamlin (#6)

Re: Allow COPY's 'text' format to output a header

On Mon, May 14, 2018 at 11:44 AM, Garick Hamlin <ghamlin@isc.upenn.edu>
wrote:

I wonder if there was a way to let COPY FROM detect or ignore headers

as appropriate and rather than cause silently result in headers being

added as data.

Not reliably

Maybe a blank line after the header line could prevent this confusion

+1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY
and even if I were to agree that CSV format is the better one this seems
like an unnecessary area to impose preferences. If TSV with Header meets
someone's need providing a minimal (and consistent with expectations)
syntax to accomplish that goal seems reasonable, as does the patch.

David J.

Isaac Morland

isaac.morland@gmail.com

over 7 years ago

In reply to: David G. Johnston (#7)

Re: Allow COPY's 'text' format to output a header

While we're discussing COPY options, what do people think of an option for
COPY FROM with header to require that the headers match the target column
names? This would help to ensure that the file is actually the right one.

On 14 May 2018 at 14:55, David G. Johnston <david.g.johnston@gmail.com>
wrote:

Show quoted text

On Mon, May 14, 2018 at 11:44 AM, Garick Hamlin <ghamlin@isc.upenn.edu>
wrote:

I wonder if there was a way to let COPY FROM detect or ignore headers

as appropriate and rather than cause silently result in headers being

added as data.

Not reliably

Maybe a blank line after the header line could prevent this confusion

No

+1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY
and even if I were to agree that CSV format is the better one this seems
like an unnecessary area to impose preferences. If TSV with Header meets
someone's need providing a minimal (and consistent with expectations)
syntax to accomplish that goal seems reasonable, as does the patch.

David J.

michael@paquier.xyz

over 7 years ago

In reply to: Isaac Morland (#8)

Re: Allow COPY's 'text' format to output a header

On Mon, May 14, 2018 at 04:08:47PM -0400, Isaac Morland wrote:

While we're discussing COPY options, what do people think of an option for
COPY FROM with header to require that the headers match the target column
names? This would help to ensure that the file is actually the right one.

I am personally not much into such sanity check logics in COPY FWIW if
we can live without.
--
Michael

#10

daniel@manitou-mail.org

over 7 years ago

In reply to: Andrew Dunstan (#5)

Re: Allow COPY's 'text' format to output a header

Andrew Dunstan wrote:

I'm not necessarily opposed to this, but I'm not certain about the use
case either.

+1.
The downside is that it would create the need, when using COPY TO,
to know whether an input file was generated with or without header,
and a hazard on mistakes.
If you say it was and it wasn't, you quietly loose the first row of data.
If you say it wasn't and in fact it was, either there's a
datatype mismatch or you quietly get a spurious row of data.

This complication should be balanced by some advantage.
What can we do with the header?
If you already have the table ready to COPY in, you don't
need that information. The only reason why COPY TO
needs to know about the header is to throw it away.
And if you don't have the table created yet, a header
with just the column names is hardly sufficient to create it,
isn't it?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#11

Isaac Morland

isaac.morland@gmail.com

over 7 years ago

In reply to: Daniel Verite (#10)

Re: Allow COPY's 'text' format to output a header

On 15 May 2018 at 10:26, Daniel Verite <daniel@manitou-mail.org> wrote:

Andrew Dunstan wrote:

I'm not necessarily opposed to this, but I'm not certain about the use
case either.

+1.
The downside is that it would create the need, when using COPY TO,
to know whether an input file was generated with or without header,
and a hazard on mistakes.
If you say it was and it wasn't, you quietly loose the first row of data.
If you say it wasn't and in fact it was, either there's a
datatype mismatch or you quietly get a spurious row of data.

Just to be clear, we're talking about my "header match" feature, not the
basic idea of allowing a header in text format?

You already need to know whether or not there is a header, no matter what:
there is no way to avoid needing to know the format of the data to be
imported. And certainly if "header" is an option, one has to know whether
or not to set it in any given situation.

The "header match" helps ensure the file is the right one by requiring the
header contents to match the field names, rather than just being thrown
away.

I don't view it as a way to avoid pre-defining the table. It just increases
the chance that the wrong file won't load but will instead trigger an error
condition immediately.

Note that this advantage includes what happens if you specify header but
the file has no header: as long as you actually specified header match, the
error will be caught unless the first row of actual data happens to match
the field names, which is almost always highly unlikely and frequently
impossible (e.g., a person with firstname "firstname", surname "surname",
birthday "birthday" and so on).

One can imagine extensions of the idea: for example, the header could
actually be used to identify the columns, so the column order in the file
doesn't matter. There could also be an "AS" syntax to allow the target
field names to be different from the field names in the header. I have
occasionally found myself wanting to ignore certain columns of the file.
But these are all significantly more complicated than just looking at the
header and requiring it to match the target field names.

If one specifies no header but there actually is a header in the file, then
loading will fail in many cases but it depends on what the header in the
file looks like. This part is unaffected by my idea.

Show quoted text

This complication should be balanced by some advantage.
What can we do with the header?
If you already have the table ready to COPY in, you don't
need that information. The only reason why COPY TO
needs to know about the header is to throw it away.
And if you don't have the table created yet, a header
with just the column names is hardly sufficient to create it,
isn't it?

#12

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Isaac Morland (#11)

Re: Allow COPY's 'text' format to output a header

Isaac Morland <isaac.morland@gmail.com> writes:

On 15 May 2018 at 10:26, Daniel Verite <daniel@manitou-mail.org> wrote:

Andrew Dunstan wrote:

I'm not necessarily opposed to this, but I'm not certain about the use
case either.

The downside is that it would create the need, when using COPY TO,
to know whether an input file was generated with or without header,
and a hazard on mistakes.
If you say it was and it wasn't, you quietly loose the first row of data.
If you say it wasn't and in fact it was, either there's a
datatype mismatch or you quietly get a spurious row of data.

Just to be clear, we're talking about my "header match" feature, not the
basic idea of allowing a header in text format?

AFAICS, Daniel's just reacting to the basic idea of a header line.
I agree that by itself that's not worth much. However, if we added
your proposed option to insist that the column names match during COPY
IN, I think that that could have some value. It would allow
forestalling one common type of pilot error, ie copying the wrong file
entirely. (It'd also prevent copying in data that has the wrong column
order, but I think that's a less common scenario. I might be wrong
about that.)

One can imagine extensions of the idea: for example, the header could
actually be used to identify the columns, so the column order in the file
doesn't matter. There could also be an "AS" syntax to allow the target
field names to be different from the field names in the header. I have
occasionally found myself wanting to ignore certain columns of the file.
But these are all significantly more complicated than just looking at the
header and requiring it to match the target field names.

Yeah, and every bit of flexibility you add raises the chance of an
undetected error. COPY isn't intended as a general ETL facility,
so I'd mostly be -1 on adding such things. But I can see the value
of confirming that you're copying the right file, and a header match
check would go a long way towards doing that.

regards, tom lane

#13

David G. Johnston

david.g.johnston@gmail.com

over 7 years ago

In reply to: Tom Lane (#12)

Re: Allow COPY's 'text' format to output a header

On Tuesday, May 15, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:

AFAICS, Daniel's just reacting to the basic idea of a header line.
I agree that by itself that's not worth much. However, if we added
your proposed option to insist that the column names match during COPY
IN, I think that that could have some value.

I'm fine for adding it without the added matching behavior, though turning
the boolean into an enum is appealing.

HEADER { true | false | match }

Though we'd need to accept all variants of Boolean for compatability...

I'm of the opinion that text and csv should be the same excepting their
defaults for some of the options.

David J.

#14

daniel@manitou-mail.org

over 7 years ago

In reply to: Isaac Morland (#11)

Re: Allow COPY's 'text' format to output a header

Isaac Morland wrote:

Just to be clear, we're talking about my "header match" feature, not the
basic idea of allowing a header in text format?

For my reply it was on merely allowing it, as does the current
patch at https://commitfest.postgresql.org/18/1629

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#15

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Tom Lane (#12)

Re: Allow COPY's 'text' format to output a header

On Tue, May 15, 2018 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

One can imagine extensions of the idea: for example, the header could
actually be used to identify the columns, so the column order in the file
doesn't matter. There could also be an "AS" syntax to allow the target
field names to be different from the field names in the header. I have
occasionally found myself wanting to ignore certain columns of the file.
But these are all significantly more complicated than just looking at the
header and requiring it to match the target field names.

Yeah, and every bit of flexibility you add raises the chance of an
undetected error. COPY isn't intended as a general ETL facility,
so I'd mostly be -1 on adding such things. But I can see the value
of confirming that you're copying the right file, and a header match
check would go a long way towards doing that.

True.

FWIW, I'm +1 on this idea. I think a header line is a pretty common
need, and if you're exporting a large amount of data, it could be
pretty annoying to have to first run COPY, and then do

(echo blah,blah1,blah2; cat copyoutput.txt)>whatireallywant.txt

There's a lot of value in being able to export from program A
*exactly* what program B wants to import, rather than something that
is close but has to be massaged.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

samullers@gmail.com

over 7 years ago

In reply to: Simon Muller (#4)

1 attachment(s)

Re: Allow COPY's 'text' format to output a header

On 14 May 2018 at 08:35, Simon Muller <samullers@gmail.com> wrote:

Okay, I've added this to the next commitfest at
https://commitfest.postgresql.org/18/1629/.

Thanks both Michael and David for the feedback so far.

I noticed through the patch tester link at http://commitfest.cputube.org/
that my patch caused a file_fdw test to fail (since I previously tested
only with "make check" and not with "make check-world").

This v2 patch should fix that.

Attachments:

text_header_v2.patchapplication/octet-stream; name=text_header_v2.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index a5e79a4549..4c6bc24913 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 853c9f9b28..adecd10d2b 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  cannot specify HEADER in BINARY mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..4db97589fd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -279,7 +279,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is not allowed when using <literal>binary</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3a66cb5025..6992b0f058 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1293,10 +1293,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify HEADER in BINARY mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2033,8 +2033,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index cb13606d14..f9301b5e6b 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -133,3 +133,5 @@ this is just a line full of junk that would error out if parsed
 \.
 
 copy copytest3 to stdout csv header;
+
+copy copytest3 to stdout with (format text, header true);
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index b7e372d61b..686c61d71d 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,3 +95,7 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+copy copytest3 to stdout with (format text, header true);
+c1	col with , comma	col with " quote
+1	a	1
+2	b	2

#17

samullers@gmail.com

over 7 years ago

In reply to: Simon Muller (#16)

1 attachment(s)

Re: Allow COPY's 'text' format to output a header

On 4 July 2018 at 22:44, Simon Muller <samullers@gmail.com> wrote:

I noticed through the patch tester link at http://commitfest.cputube.org/
that my patch caused a file_fdw test to fail (since I previously tested
only with "make check" and not with "make check-world").

This v2 patch should fix that.

This patch just fixes a newline issue introduced in my previous patch.

Attachments:

text_header_v3.patchapplication/octet-stream; name=text_header_v3.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index a5e79a4549..4c6bc24913 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 853c9f9b28..adecd10d2b 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  cannot specify HEADER in BINARY mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..4db97589fd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -279,7 +279,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is not allowed when using <literal>binary</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3a66cb5025..6992b0f058 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1293,10 +1293,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify HEADER in BINARY mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2033,8 +2033,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index cb13606d14..f9301b5e6b 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -133,3 +133,5 @@ this is just a line full of junk that would error out if parsed
 \.
 
 copy copytest3 to stdout csv header;
+
+copy copytest3 to stdout with (format text, header true);
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index b7e372d61b..686c61d71d 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,3 +95,7 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+copy copytest3 to stdout with (format text, header true);
+c1	col with , comma	col with " quote
+1	a	1
+2	b	2

#18

cynthia.shang@crunchydata.com

over 7 years ago

In reply to: Simon Muller (#17)

1 attachment(s)

Re: Allow COPY's 'text' format to output a header

On 4 July 2018 at 22:44, Simon Muller <samullers@gmail.com <mailto:samullers@gmail.com>> wrote:
I noticed through the patch tester link at http://commitfest.cputube.org/ <http://commitfest.cputube.org/> that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").

This v2 patch should fix that.

This patch just fixes a newline issue introduced in my previous patch.

I've reviewed this patch and feel this patch addresses the original ask. I tested it manually trying to break it and, as mentioned previously, it's behavior is the same as the CSV copy with regards to it's shortcomings. However, I feel
1) a "copy from" test is needed and
2) the current "copy to" test is (along with a few others) in the wrong file.

With regards to #2, the copy.source tests are for things requiring replacement when running the tests. Given that these copy tests do not, I have moved the current last set of copy tests to the copy2.sql file and have provided an attached patch.

With regards to #1, the patch I have provided can then be used and the following added as the COPY TO/FROM tests (perhaps after line 426 of the attached copy2.sql file). Note that I moved the FROM test before the TO test and omitted the "(format text, header true)" in the FROM test since it is another way the command can be invoked.

copy copytest3 from stdin header;
this is just a line full of junk that would error out if parsed
11 a 1
22 b 2
\.

copy copytest3 to stdout with (format text, header true);

As for the matching check of the header in the discussion of this patch, I feel that is a separate patch that can be added later since it would affect the general functionality of the copy command, not just the ability to have a text header.

Best,
- Cynthia Shang

Attachments:

move-copy-tests-v1.patchapplication/octet-stream; name=move-copy-tests-v1.patch; x-unix-mode=0666Download

diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index eb9e4b9774..f5f5c4c407 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -557,6 +557,16 @@ SELECT * FROM instead_of_insert_tbl;
   1 | test1
 (1 row)
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+copy copytest3 from stdin csv header;
+copy copytest3 to stdout csv header;
+c1,"col with , comma","col with "" quote"
+1,a,1
+2,b,2
 -- clean up
 DROP TABLE forcetest;
 DROP TABLE vistest;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index cb13606d14..20a140ab78 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -117,19 +117,3 @@ copy copytest to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\
 copy copytest2 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
 
 select * from copytest except select * from copytest2;
-
-
--- test header line feature
-
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-
-copy copytest3 from stdin csv header;
-this is just a line full of junk that would error out if parsed
-1,a,1
-2,b,2
-\.
-
-copy copytest3 to stdout csv header;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index b7e372d61b..9314622768 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -85,13 +85,3 @@ select * from copytest except select * from copytest2;
 -------+------+--------
 (0 rows)
 
--- test header line feature
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-copy copytest3 from stdin csv header;
-copy copytest3 to stdout csv header;
-c1,"col with , comma","col with "" quote"
-1,a,1
-2,b,2
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index f3a6d228fa..ce87f778a6 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -411,6 +411,19 @@ test1
 
 SELECT * FROM instead_of_insert_tbl;
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+
+copy copytest3 from stdin csv header;
+this is just a line full of junk that would error out if parsed
+1,a,1
+2,b,2
+\.
+
+copy copytest3 to stdout csv header;
 
 -- clean up
 DROP TABLE forcetest;

#19

cynthia.shang@crunchydata.com

over 7 years ago

In reply to: Cynthia Shang (#18)

1 attachment(s)

Re: Allow COPY's 'text' format to output a header

On Wed, Jul 25, 2018 at 1:24 PM, Cynthia Shang
<cynthia.shang@crunchydata.com> wrote:

With regards to #2, the copy.source tests are for things requiring
replacement when running the tests. Given that these copy tests do not, I
have moved the current last set of copy tests to the copy2.sql file and have
provided an attached patch.

The patch appears in the RAW and in your email (hopefully) but it
doesn't appear in the thread archive so I am reattaching from a
different email client.

Attachments:

move-copy-tests-v1.patchapplication/octet-stream; name=move-copy-tests-v1.patchDownload

diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index eb9e4b9774..f5f5c4c407 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -557,6 +557,16 @@ SELECT * FROM instead_of_insert_tbl;
   1 | test1
 (1 row)
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+copy copytest3 from stdin csv header;
+copy copytest3 to stdout csv header;
+c1,"col with , comma","col with "" quote"
+1,a,1
+2,b,2
 -- clean up
 DROP TABLE forcetest;
 DROP TABLE vistest;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index cb13606d14..20a140ab78 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -117,19 +117,3 @@ copy copytest to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\
 copy copytest2 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
 
 select * from copytest except select * from copytest2;
-
-
--- test header line feature
-
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-
-copy copytest3 from stdin csv header;
-this is just a line full of junk that would error out if parsed
-1,a,1
-2,b,2
-\.
-
-copy copytest3 to stdout csv header;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index b7e372d61b..9314622768 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -85,13 +85,3 @@ select * from copytest except select * from copytest2;
 -------+------+--------
 (0 rows)
 
--- test header line feature
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-copy copytest3 from stdin csv header;
-copy copytest3 to stdout csv header;
-c1,"col with , comma","col with "" quote"
-1,a,1
-2,b,2
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index f3a6d228fa..ce87f778a6 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -411,6 +411,19 @@ test1
 
 SELECT * FROM instead_of_insert_tbl;
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+
+copy copytest3 from stdin csv header;
+this is just a line full of junk that would error out if parsed
+1,a,1
+2,b,2
+\.
+
+copy copytest3 to stdout csv header;
 
 -- clean up
 DROP TABLE forcetest;

#20

samullers@gmail.com

over 7 years ago

In reply to: Cynthia Shang (#18)

1 attachment(s)

Re: Allow COPY's 'text' format to output a header

On 25 July 2018 at 19:24, Cynthia Shang <cynthia.shang@crunchydata.com>
wrote:

I've reviewed this patch and feel this patch addresses the original ask. I
tested it manually trying to break it and, as mentioned previously, it's
behavior is the same as the CSV copy with regards to it's shortcomings.
However, I feel
1) a "copy from" test is needed and
2) the current "copy to" test is (along with a few others) in the wrong
file.

With regards to #2, the copy.source tests are for things requiring
replacement when running the tests. Given that these copy tests do not, I
have moved the current last set of copy tests to the copy2.sql file and
have provided an attached patch.

Thanks for reviewing the patch.

I agree that moving those previous and these new tests out of the .source
files seems to make more sense as they don't make use of the
preprocessing/replacement feature.

With regards to #1, the patch I have provided can then be used and the

following added as the COPY TO/FROM tests (perhaps after line 426 of the
attached copy2.sql file). Note that I moved the FROM test before the TO
test and omitted the "(format text, header true)" in the FROM test since it
is another way the command can be invoked.

copy copytest3 from stdin header;
this is just a line full of junk that would error out if parsed
11 a 1
22 b 2
\.

copy copytest3 to stdout with (format text, header true);

I've incorporated both your suggestions and included the patch you provided
in the attached patch. Hope it's as expected.

As for the matching check of the header in the discussion of this patch, I
feel that is a separate patch that can be added later since it would affect
the general functionality of the copy command, not just the ability to have
a text header.

Best,
- Cynthia Shang

P.S. I did receive the first attached patch, but on my Ubuntu I had to
apply it using "git apply --ignore-space-change --ignore-whitespace",
probably due to line ending differences.

--
Simon Muller

Attachments:

text_header_v4.patchapplication/octet-stream; name=text_header_v4.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index a5e79a4549..4c6bc24913 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 853c9f9b28..adecd10d2b 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  cannot specify HEADER in BINARY mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..4db97589fd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -279,7 +279,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is not allowed when using <literal>binary</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3a66cb5025..6992b0f058 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1293,10 +1293,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify HEADER in BINARY mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2033,8 +2033,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index eb9e4b9774..f7cd364b63 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -557,6 +557,23 @@ SELECT * FROM instead_of_insert_tbl;
   1 | test1
 (1 row)
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+copy copytest3 from stdin csv header;
+copy copytest3 to stdout csv header;
+c1,"col with , comma","col with "" quote"
+1,a,1
+2,b,2
+copy copytest3 from stdin header;
+copy copytest3 to stdout with (format text, header true);
+c1	col with , comma	col with " quote
+1	a	1
+2	b	2
+3	c	3
+4	d	4
 -- clean up
 DROP TABLE forcetest;
 DROP TABLE vistest;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index cb13606d14..20a140ab78 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -117,19 +117,3 @@ copy copytest to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\
 copy copytest2 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
 
 select * from copytest except select * from copytest2;
-
-
--- test header line feature
-
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-
-copy copytest3 from stdin csv header;
-this is just a line full of junk that would error out if parsed
-1,a,1
-2,b,2
-\.
-
-copy copytest3 to stdout csv header;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index b7e372d61b..9314622768 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -85,13 +85,3 @@ select * from copytest except select * from copytest2;
 -------+------+--------
 (0 rows)
 
--- test header line feature
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-copy copytest3 from stdin csv header;
-copy copytest3 to stdout csv header;
-c1,"col with , comma","col with "" quote"
-1,a,1
-2,b,2
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index f3a6d228fa..c74a1e6469 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -411,6 +411,27 @@ test1
 
 SELECT * FROM instead_of_insert_tbl;
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+
+copy copytest3 from stdin csv header;
+this is just a line full of junk that would error out if parsed
+1,a,1
+2,b,2
+\.
+
+copy copytest3 to stdout csv header;
+
+copy copytest3 from stdin header;
+this is just a line full of junk that would error out if parsed
+3	c	3
+4	d	4
+\.
+
+copy copytest3 to stdout with (format text, header true);
 
 -- clean up
 DROP TABLE forcetest;

#21

cynthia.shang@crunchydata.com

over 7 years ago

In reply to: Simon Muller (#20)

Re: Allow COPY's 'text' format to output a header

On Jul 25, 2018, at 6:09 PM, Simon Muller <samullers@gmail.com> wrote:

I've incorporated both your suggestions and included the patch you provided in the attached patch. Hope it's as expected.
--
Simon Muller

<text_header_v4.patch>

Reviewed and retested. Changing status to Ready for Committer.

#22

daniel@manitou-mail.org

over 7 years ago

In reply to: Simon Muller (#20)

Re: Allow COPY's 'text' format to output a header

Simon Muller wrote:

I've incorporated both your suggestions and included the patch you provided
in the attached patch. Hope it's as expected.

Still unconvinced about the use case, since COPY's text format is only
meant to be consumed by Postgres, and the only way that Postgres will
consume this header is to discard it (at least as of the current
patch). But anyway...

   /* Check header */
-  if (!cstate->csv_mode && cstate->header_line)
+  if (cstate->binary && cstate->header_line)
     ereport(ERROR,
-  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-    errmsg("COPY HEADER available only in CSV mode")));
+	 (errcode(ERRCODE_SYNTAX_ERROR),
+	  errmsg("cannot specify HEADER in BINARY mode")));

Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#23

cynthia.shang@crunchydata.com

over 7 years ago

In reply to: Daniel Verite (#22)

Re: Allow COPY's 'text' format to output a header

On Aug 1, 2018, at 10:20 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
/* Check header */
-  if (!cstate->csv_mode && cstate->header_line)
+  if (cstate->binary && cstate->header_line)
ereport(ERROR,
-  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-    errmsg("COPY HEADER available only in CSV mode")));
+	 (errcode(ERRCODE_SYNTAX_ERROR),
+	  errmsg("cannot specify HEADER in BINARY mode")));
Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR?

I agree; it should remain ERRCODE_FEATURE_NOT_SUPPORTED and I might also suggest the message read "COPY HEADER not available in BINARY mode", although I'm pretty agnostic on the latter.

Regards,
-Cynthia Shang

#24

samullers@gmail.com

over 7 years ago

In reply to: Cynthia Shang (#23)

Re: Allow COPY's 'text' format to output a header

On 1 August 2018 at 17:18, Cynthia Shang <cynthia.shang@crunchydata.com>
wrote:

On Aug 1, 2018, at 10:20 AM, Daniel Verite <daniel@manitou-mail.org>

wrote:
/* Check header */
-  if (!cstate->csv_mode && cstate->header_line)
+  if (cstate->binary && cstate->header_line)
ereport(ERROR,
-  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-    errmsg("COPY HEADER available only in CSV mode")));
+      (errcode(ERRCODE_SYNTAX_ERROR),
+       errmsg("cannot specify HEADER in BINARY mode")));
Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR?
I agree; it should remain ERRCODE_FEATURE_NOT_SUPPORTED and I might also
suggest the message read "COPY HEADER not available in BINARY mode",
although I'm pretty agnostic on the latter.

Regards,
-Cynthia Shang

I changed the error type and message for consistency with other similar
errors in that file. Whenever options are combined that are incompatible,
it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown.

For instance, in case you both specify a specific DELIMITER but also
declare the format as BINARY, then there is this code in that same file:

if (cstate->binary && cstate->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DELIMITER in BINARY mode")));

HEADER seems very similar to me since, like DELIMITER, it makes sense for
the textual formats such as CSV and TEXT, but doesn't make sense with the
BINARY format.

ERRCODE_FEATURE_NOT_SUPPORTED previously made sense since the only reason
TEXT and HEADER weren't compatible options was because the feature was not
yet implemented, but now ERRCODE_SYNTAX_ERROR seems to make sense to me
since I can't foresee a use case where BINARY and HEADER would ever be
compatible options.

--
Simon Muller

#25

daniel@manitou-mail.org

over 7 years ago

In reply to: Simon Muller (#24)

Re: Allow COPY's 'text' format to output a header

Simon Muller wrote:

I changed the error type and message for consistency with other similar
errors in that file. Whenever options are combined that are incompatible,
it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown.

That makes sense, thanks for elaborating, although there are also
a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c
that are raised on forbidden/nonsensical combination of features,
so the consistency argument could work both ways.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#26

cynthia.shang@crunchydata.com

over 7 years ago

In reply to: Daniel Verite (#25)

Re: Allow COPY's 'text' format to output a header

On Aug 2, 2018, at 8:11 AM, Daniel Verite <daniel@manitou-mail.org> wrote:

That makes sense, thanks for elaborating, although there are also
a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c
that are raised on forbidden/nonsensical combination of features,
so the consistency argument could work both ways.

If there is not a strong reason to change the error code, then I believe we should not. The error is the same as it was before, just narrower in scope.

Best,
-Cynthia

#27

samullers@gmail.com

over 7 years ago

In reply to: Cynthia Shang (#26)

1 attachment(s)

Re: Allow COPY's 'text' format to output a header

On 2 August 2018 at 17:07, Cynthia Shang <cynthia.shang@crunchydata.com>
wrote:

On Aug 2, 2018, at 8:11 AM, Daniel Verite <daniel@manitou-mail.org>

wrote:

That makes sense, thanks for elaborating, although there are also
a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c
that are raised on forbidden/nonsensical combination of features,
so the consistency argument could work both ways.

If there is not a strong reason to change the error code, then I believe
we should not. The error is the same as it was before, just narrower in
scope.

Best,
-Cynthia

Sure, thanks both for the feedback. Attached is a patch with the error kept
as ERRCODE_FEATURE_NOT_SUPPORTED.

--
Simon Muller

Attachments:

text_header_v5.patchapplication/octet-stream; name=text_header_v5.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index a5e79a4549..4c6bc24913 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 853c9f9b28..adecd10d2b 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  cannot specify HEADER in BINARY mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..4db97589fd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -279,7 +279,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is not allowed when using <literal>binary</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3a66cb5025..fe9e25b988 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1293,10 +1293,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("cannot specify HEADER in BINARY mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2033,8 +2033,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index eb9e4b9774..f7cd364b63 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -557,6 +557,23 @@ SELECT * FROM instead_of_insert_tbl;
   1 | test1
 (1 row)
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+copy copytest3 from stdin csv header;
+copy copytest3 to stdout csv header;
+c1,"col with , comma","col with "" quote"
+1,a,1
+2,b,2
+copy copytest3 from stdin header;
+copy copytest3 to stdout with (format text, header true);
+c1	col with , comma	col with " quote
+1	a	1
+2	b	2
+3	c	3
+4	d	4
 -- clean up
 DROP TABLE forcetest;
 DROP TABLE vistest;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index cb13606d14..20a140ab78 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -117,19 +117,3 @@ copy copytest to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\
 copy copytest2 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
 
 select * from copytest except select * from copytest2;
-
-
--- test header line feature
-
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-
-copy copytest3 from stdin csv header;
-this is just a line full of junk that would error out if parsed
-1,a,1
-2,b,2
-\.
-
-copy copytest3 to stdout csv header;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index b7e372d61b..9314622768 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -85,13 +85,3 @@ select * from copytest except select * from copytest2;
 -------+------+--------
 (0 rows)
 
--- test header line feature
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-copy copytest3 from stdin csv header;
-copy copytest3 to stdout csv header;
-c1,"col with , comma","col with "" quote"
-1,a,1
-2,b,2
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index f3a6d228fa..c74a1e6469 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -411,6 +411,27 @@ test1
 
 SELECT * FROM instead_of_insert_tbl;
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+
+copy copytest3 from stdin csv header;
+this is just a line full of junk that would error out if parsed
+1,a,1
+2,b,2
+\.
+
+copy copytest3 to stdout csv header;
+
+copy copytest3 from stdin header;
+this is just a line full of junk that would error out if parsed
+3	c	3
+4	d	4
+\.
+
+copy copytest3 to stdout with (format text, header true);
 
 -- clean up
 DROP TABLE forcetest;

#28

cynthia.shang@crunchydata.com

over 7 years ago

In reply to: Simon Muller (#27)

Re: Allow COPY's 'text' format to output a header

On Aug 2, 2018, at 3:30 PM, Simon Muller <samullers@gmail.com> wrote:

Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED.

I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All looks good.

-Cynthia

#29

Stephen Frost

sfrost@snowman.net

over 7 years ago

In reply to: Cynthia Shang (#28)

Re: Allow COPY's 'text' format to output a header

Greetings,

* Cynthia Shang (cynthia.shang@crunchydata.com) wrote:

On Aug 2, 2018, at 3:30 PM, Simon Muller <samullers@gmail.com> wrote:

Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED.

I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All looks good.

If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.

Thanks!

Stephen

#30

samullers@gmail.com

over 7 years ago

In reply to: Stephen Frost (#29)

1 attachment(s)

Re: Allow COPY's 'text' format to output a header

On 6 August 2018 at 16:34, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Cynthia Shang (cynthia.shang@crunchydata.com) wrote:

I was able to apply the patch (after resolving a merge conflict which

was expected given an update in master). All looks good.

If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.

Thanks!

Stephen

Attached is an updated patch that should directly apply against current
master.

--
Simon Muller

Attachments:

text_header_v6.patchapplication/octet-stream; name=text_header_v6.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index a5e79a4549..4c6bc24913 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 853c9f9b28..adecd10d2b 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  cannot specify HEADER in BINARY mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..4db97589fd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -279,7 +279,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is not allowed when using <literal>binary</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9bc67ce60f..165883165d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1301,10 +1301,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("cannot specify HEADER in BINARY mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2041,8 +2041,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index eb9e4b9774..f7cd364b63 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -557,6 +557,23 @@ SELECT * FROM instead_of_insert_tbl;
   1 | test1
 (1 row)
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+copy copytest3 from stdin csv header;
+copy copytest3 to stdout csv header;
+c1,"col with , comma","col with "" quote"
+1,a,1
+2,b,2
+copy copytest3 from stdin header;
+copy copytest3 to stdout with (format text, header true);
+c1	col with , comma	col with " quote
+1	a	1
+2	b	2
+3	c	3
+4	d	4
 -- clean up
 DROP TABLE forcetest;
 DROP TABLE vistest;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 014b1b5711..dae1b43f4b 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -118,22 +118,6 @@ copy copytest2 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape
 
 select * from copytest except select * from copytest2;
 
-
--- test header line feature
-
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-
-copy copytest3 from stdin csv header;
-this is just a line full of junk that would error out if parsed
-1,a,1
-2,b,2
-\.
-
-copy copytest3 to stdout csv header;
-
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index ab096153ad..1c58908ae5 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -85,16 +85,6 @@ select * from copytest except select * from copytest2;
 -------+------+--------
 (0 rows)
 
--- test header line feature
-create temp table copytest3 (
-	c1 int,
-	"col with , comma" text,
-	"col with "" quote"  int);
-copy copytest3 from stdin csv header;
-copy copytest3 to stdout csv header;
-c1,"col with , comma","col with "" quote"
-1,a,1
-2,b,2
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index f3a6d228fa..c74a1e6469 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -411,6 +411,27 @@ test1
 
 SELECT * FROM instead_of_insert_tbl;
 
+-- test header line feature
+create temp table copytest3 (
+	c1 int,
+	"col with , comma" text,
+	"col with "" quote"  int);
+
+copy copytest3 from stdin csv header;
+this is just a line full of junk that would error out if parsed
+1,a,1
+2,b,2
+\.
+
+copy copytest3 to stdout csv header;
+
+copy copytest3 from stdin header;
+this is just a line full of junk that would error out if parsed
+3	c	3
+4	d	4
+\.
+
+copy copytest3 to stdout with (format text, header true);
 
 -- clean up
 DROP TABLE forcetest;

#31

cynthia.shang@crunchydata.com

over 7 years ago

In reply to: Simon Muller (#30)

Re: Allow COPY's 'text' format to output a header

On Aug 8, 2018, at 2:57 PM, Simon Muller <samullers@gmail.com> wrote:

If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.

Thanks!

Stephen

Attached is an updated patch that should directly apply against current master.

--
Simon Muller

<text_header_v6.patch>

This patch looks good. I realized I should have changed the status back while we were discussing all this. It is now (and still is) ready for committer.

Thanks,
-Cynthia

#32

michael@paquier.xyz

over 7 years ago

In reply to: Cynthia Shang (#31)

Re: Allow COPY's 'text' format to output a header

On Thu, Aug 09, 2018 at 10:37:28AM -0400, Cynthia Shang wrote:

This patch looks good. I realized I should have changed the status
back while we were discussing all this. It is now (and still is) ready
for committer.

I have some comments.

-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  cannot specify HEADER in BINARY mode
This should read "COPY HEADER not available in BINARY mode" perhaps?

+copy copytest3 from stdin csv header;
+copy copytest3 to stdout csv header;
It would be more interesting to first export the data into the file with
a header, truncate the relation, and import it back with again header
specified. The data of the original should match the new, for both text
and csv format.

CopyStateData defines header_line, which still assumes that only CSV is
supported.

Why are there no additional tests for file_fdw?

The point about the header matching mentioned upthread is quite
interesting as it could make the proposed feature way more useful, and
it has not really been discussed. As far as I can see this adds more
sanity checks in NextCopyFromRawFields(). I'd like to think that this
should be a completely different option, say CHECK_HEADER, as CSV simply
skips the header in COPY FROM if specified on HEAD.
--
Michael

#33

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#32)

Re: Allow COPY's 'text' format to output a header

On Fri, Aug 17, 2018 at 01:39:11PM +0900, Michael Paquier wrote:

The point about the header matching mentioned upthread is quite
interesting as it could make the proposed feature way more useful, and
it has not really been discussed. As far as I can see this adds more
sanity checks in NextCopyFromRawFields(). I'd like to think that this
should be a completely different option, say CHECK_HEADER, as CSV simply
skips the header in COPY FROM if specified on HEAD.

It has been a couple of weeks since the last review, which has not been
addressed, so I am marking the patch as returned with feedback.
--
Michael

#34

remi.lapeyre@henki.fr

almost 6 years ago

In reply to: Michael Paquier (#33)

1 attachment(s)

[PATCH v1] Allow COPY "test" to output a header and add header matching mode to COPY FROM

Hi, here's a new version of the patch with the header matching feature.
I should apply cleanly on master, let me know if anything's wrong.

---
contrib/file_fdw/input/file_fdw.source | 7 +-
contrib/file_fdw/output/file_fdw.source | 13 ++--
doc/src/sgml/ref/copy.sgml | 9 ++-
src/backend/commands/copy.c | 93 ++++++++++++++++++++++---
src/test/regress/input/copy.source | 71 ++++++++++++++-----
src/test/regress/output/copy.source | 58 ++++++++++-----
6 files changed, 202 insertions(+), 49 deletions(-)

Attachments:

v1-0001-Allow-COPY-test-to-output-a-header-and-add-header.patchtext/x-patch; name=v1-0001-Allow-COPY-test-to-output-a-header-and-add-header.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;

 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
@@ -80,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);

+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..d76a3dc6f8 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
@@ -95,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -441,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a99f8155e4..36bdd87726 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,7 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> | <literal>false</literal> }
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -269,8 +269,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      names from the table, and on input, the first line is required to match
+      the column names if set to <literal>match</literal> or discarded when set
+      to <literal>true</literal>.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4cb8..f1bbbf841b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -94,6 +94,16 @@ typedef enum CopyInsertMethod
 	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
 } CopyInsertMethod;

+/*
+ * Represents whether the head must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -135,7 +145,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1184,7 +1194,28 @@ ProcessCopyOptions(ParseState *pstate,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
-			cstate->header_line = defGetBoolean(defel);
+
+			PG_TRY();
+			{
+				if (defGetBoolean(defel))
+					cstate->header_line = COPY_HEADER_PRESENT;
+				else
+					cstate->header_line = COPY_HEADER_ABSENT;
+			}
+			PG_CATCH();
+			{
+				if (!cstate->is_copy_from)
+					PG_RE_THROW();
+
+				char	   *sval = defGetString(defel);
+				if (pg_strcasecmp(sval, "match") == 0)
+					cstate->header_line = COPY_HEADER_MATCH;
+				else
+					ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("header requires a boolean or \"match\"")));
+			}
+			PG_END_TRY();
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -1365,10 +1396,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));

 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));

 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2100,8 +2131,11 @@ CopyTo(CopyState cstate)

 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);

-				CopyAttributeOutCSV(cstate, colname, false,
-									list_length(cstate->attnumlist) == 1);
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
+										list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}

 			CopySendEndOfRow(cstate);
@@ -3639,12 +3673,53 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->binary);

-	/* on input just throw the header line away */
+	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->header_line)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int				attnum = lfirst_int(cur);
+				char		*colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						errmsg("wrong header for column \"%s\": got \"%s\"",
+								NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}

 	cstate->cur_lineno++;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..dc7341529f 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -87,52 +87,66 @@ ANALYZE bt_f8_heap;
 ANALYZE array_op_test;
 ANALYZE array_index_op_test;

+-- test header line feature
+
+create temp table copytest (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest to stdout (header);
+
 --- test copying in CSV mode with various styles
 --- of embedded line ending characters

-create temp table copytest (
+create temp table copytest2 (
 	style	text,
 	test 	text,
 	filler	int);

-insert into copytest values('DOS',E'abc\r\ndef',1);
-insert into copytest values('Unix',E'abc\ndef',2);
-insert into copytest values('Mac',E'abc\rdef',3);
-insert into copytest values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);
+insert into copytest2 values('DOS',E'abc\r\ndef',1);
+insert into copytest2 values('Unix',E'abc\ndef',2);
+insert into copytest2 values('Mac',E'abc\rdef',3);
+insert into copytest2 values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);

-copy copytest to '@abs_builddir@/results/copytest.csv' csv;
+copy copytest2 to '@abs_builddir@/results/copytest.csv' csv;

-create temp table copytest2 (like copytest);
+create temp table copytest3 (like copytest2);

-copy copytest2 from '@abs_builddir@/results/copytest.csv' csv;
+copy copytest3 from '@abs_builddir@/results/copytest.csv' csv;

-select * from copytest except select * from copytest2;
+select * from copytest2 except select * from copytest3;

-truncate copytest2;
+truncate copytest3;

 --- same test but with an escape char different from quote char

-copy copytest to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
+copy copytest2 to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';

-copy copytest2 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
+copy copytest3 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';

-select * from copytest except select * from copytest2;
+select * from copytest2 except select * from copytest3;


 -- test header line feature

-create temp table copytest3 (
+create temp table copytest4 (
 	c1 int,
 	"col with , comma" text,
 	"col with "" quote"  int);

-copy copytest3 from stdin csv header;
+copy copytest4 from stdin csv header;
 this is just a line full of junk that would error out if parsed
 1,a,1
 2,b,2
 \.

-copy copytest3 to stdout csv header;
+copy copytest4 to stdout csv header;

 -- test copy from with a partitioned table
 create table parted_copytest (
@@ -201,3 +215,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;

 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..c50a2f092c 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -58,40 +58,49 @@ ANALYZE bt_txt_heap;
 ANALYZE bt_f8_heap;
 ANALYZE array_op_test;
 ANALYZE array_index_op_test;
+-- test header line feature
+create temp table copytest (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest from stdin (header);
+copy copytest to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 --- test copying in CSV mode with various styles
 --- of embedded line ending characters
-create temp table copytest (
+create temp table copytest2 (
 	style	text,
 	test 	text,
 	filler	int);
-insert into copytest values('DOS',E'abc\r\ndef',1);
-insert into copytest values('Unix',E'abc\ndef',2);
-insert into copytest values('Mac',E'abc\rdef',3);
-insert into copytest values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);
-copy copytest to '@abs_builddir@/results/copytest.csv' csv;
-create temp table copytest2 (like copytest);
-copy copytest2 from '@abs_builddir@/results/copytest.csv' csv;
-select * from copytest except select * from copytest2;
+insert into copytest2 values('DOS',E'abc\r\ndef',1);
+insert into copytest2 values('Unix',E'abc\ndef',2);
+insert into copytest2 values('Mac',E'abc\rdef',3);
+insert into copytest2 values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);
+copy copytest2 to '@abs_builddir@/results/copytest.csv' csv;
+create temp table copytest3 (like copytest2);
+copy copytest3 from '@abs_builddir@/results/copytest.csv' csv;
+select * from copytest2 except select * from copytest3;
  style | test | filler
 -------+------+--------
 (0 rows)

-truncate copytest2;
+truncate copytest3;
 --- same test but with an escape char different from quote char
-copy copytest to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
-copy copytest2 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
-select * from copytest except select * from copytest2;
+copy copytest2 to '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
+copy copytest3 from '@abs_builddir@/results/copytest.csv' csv quote '''' escape E'\\';
+select * from copytest2 except select * from copytest3;
  style | test | filler
 -------+------+--------
 (0 rows)

 -- test header line feature
-create temp table copytest3 (
+create temp table copytest4 (
 	c1 int,
 	"col with , comma" text,
 	"col with "" quote"  int);
-copy copytest3 from stdin csv header;
-copy copytest3 to stdout csv header;
+copy copytest4 from stdin csv header;
+copy copytest4 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
@@ -165,3 +174,20 @@ select * from parted_copytest where b = 2;
 (1 row)

 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

#35

remi.lapeyre@henki.fr

almost 6 years ago

In reply to: Rémi Lapeyre (#34)

Re: [PATCH v1] Allow COPY "text" to output a header and add header matching mode to COPY FROM

I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that I need to do?

#36

Surafel Temesgen

surafel3000@gmail.com

almost 6 years ago

In reply to: Rémi Lapeyre (#35)

Re: [PATCH v1] Allow COPY "text" to output a header and add header matching mode to COPY FROM

On Mon, Mar 2, 2020 at 2:45 AM Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:

I created an entry for this patch in the new CommiFest but it seems that
it is not finding it. Is there anything that I need to do?

Is is added on next open commit fest which is
https://commitfest.postgresql.org/28/ now

regards
Surafel

#37

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Rémi Lapeyre (#35)

Re: [PATCH v1] Allow COPY "text" to output a header and add header matching mode to COPY FROM

On 2 Mar 2020, at 00:45, Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:

I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that I need to do?

This patch no longer applies cleanly on HEAD, due to changes in the regress
tests. Please submit a rebased version, I've marked this entry as Waiting on
Author for now.

cheers ./daniel

#38

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Daniel Gustafsson (#37)

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

On 8 Jul 2020, at 13:45, Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:

Hi, here's a new version of the patch that should apply cleanly. I'll monitor
the status on http://cfbot.cputube.org/

Please reply to the old thread about this, as that's the one connected to the
Commitfest entry and thats where all the discussion has happened. While
additional threads can be attached to a CF entry, it's for when multiple
discussions are relevant to a patch, a single discussion should not be broken
into multiple threads.

cheers ./daniel

Import Notes

Reply to msg id not found: 20200708114501.56681-1-remi.lapeyre@lenstra.fr

#39

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: Daniel Gustafsson (#38)

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

Please reply to the old thread about this, as that's the one connected to the
Commitfest entry and thats where all the discussion has happened. While
additional threads can be attached to a CF entry, it's for when multiple
discussions are relevant to a patch, a single discussion should not be broken
into multiple threads.

Sorry about this, I thought setting the In-Reply-To like so:

git send-email -v2 --to=pgsql-hackers@postgresql.org --in-reply-to=4E31E7AA-BFC6-47ED-90E1-3838E4D1F4FF@yesql.se HEAD^

There is some nice informations about how to write a good commit but I could not find exactly how to send it so I probably did something wrong.

Show quoted text

cheers ./daniel

#40

Justin Pryzby

pryzby@telsasoft.com

over 5 years ago

In reply to: Daniel Gustafsson (#38)

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

On Wed, Jul 08, 2020 at 03:21:48PM +0200, Daniel Gustafsson wrote:

On 8 Jul 2020, at 13:45, Rï¿½mi Lapeyre <remi.lapeyre@lenstra.fr> wrote:

Hi, here's a new version of the patch that should apply cleanly. I'll monitor
the status on http://cfbot.cputube.org/

Please reply to the old thread about this, as that's the one connected to the

Actually, it seems like the subject was changed, but it was correctly
associated with the existing thread, as defined by:
|In-Reply-To: <4E31E7AA-BFC6-47ED-90E1-3838E4D1F4FF@yesql.se>

And ML/pglister and cfbot see it as such
"In response to Re: [PATCH v1] Allow COPY "text" to output a header and add header matching mode to COPY FROM at 2020-07-01 09:04:21 from Daniel Gustafsson"
https://commitfest.postgresql.org/28/2504/

--
Justin

#41

peter.eisentraut@2ndquadrant.com

over 5 years ago

In reply to: Daniel Gustafsson (#37)

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

On 2020-07-08 13:45, Rémi Lapeyre wrote:

Hi, here's a new version of the patch that should apply cleanly. I'll monitor
the status on http://cfbot.cputube.org/

It's hard to find an explanation what this patch actually does. I don't
want to have to go through threads dating back 4 months to determine
what was discussed and what was actually implemented. Since you're
already using git format-patch, just add something to the commit message.

It appears that these are really two separate features, so perhaps they
should be two patches.

Also, the new header matching mode could probably use more than one line
of documentation.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Import Notes

Reply to msg id not found: 20200708114501.56681-1-remi.lapeyre@lenstra.fr

#42

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: Peter Eisentraut (#41)

Add header support to text format and matching feature

It's hard to find an explanation what this patch actually does. I don't
want to have to go through threads dating back 4 months to determine
what was discussed and what was actually implemented. Since you're
already using git format-patch, just add something to the commit message.

It appears that these are really two separate features, so perhaps they
should be two patches.

Thanks for the feedback, I've split cleanly the two patches, simplified the
tests and tried to explain the changes in the commit message.

Also, the new header matching mode could probably use more than one line
of documentation.

I've improved the documentation, let's me know if it's better.

It seems like cfbot is not happy with the way I'm sending my patches. The wiki
has some good advices on how to write a patch but I couldn't find anything on
how to send it. I've used

git send-email -v3 --compose --to=... --in-reply-to=... HEAD^^

here but I'm not sure if it's correct. I will see if it works and will try to fix
it if it's not but since it runs once a day it may take some time.

#43

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: Rémi Lapeyre (#42)

1 attachment(s)

[PATCH v3 1/2] Add header support to "COPY TO" text format

CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: /messages/by-id/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
contrib/file_fdw/input/file_fdw.source | 1 -
contrib/file_fdw/output/file_fdw.source | 4 +---
doc/src/sgml/ref/copy.sgml | 3 ++-
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/input/copy.source | 12 ++++++++++++
src/test/regress/output/copy.source | 8 ++++++++
6 files changed, 30 insertions(+), 9 deletions(-)

Attachments:

v3-0001-Add-header-support-to-COPY-TO-text-format.patchtext/x-patch; name=v3-0001-Add-header-support-to-COPY-TO-text-format.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189abc6c..c628a69c57 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -269,7 +269,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c4cb..a21508a974 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -136,7 +136,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1363,10 +1363,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2099,8 +2099,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..2368649111 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..c1f7f99747 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

#44

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: Rémi Lapeyre (#42)

1 attachment(s)

[PATCH v3 2/2] Add header matching mode to "COPY FROM"

COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: /messages/by-id/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
contrib/file_fdw/input/file_fdw.source | 6 ++
contrib/file_fdw/output/file_fdw.source | 9 ++-
doc/src/sgml/ref/copy.sgml | 8 ++-
src/backend/commands/copy.c | 84 +++++++++++++++++++++++--
src/test/regress/input/copy.source | 25 ++++++++
src/test/regress/output/copy.source | 17 +++++
6 files changed, 140 insertions(+), 9 deletions(-)

Attachments:

v3-0002-Add-header-matching-mode-to-COPY-FROM.patchtext/x-patch; name=v3-0002-Add-header-matching-mode-to-COPY-FROM.patchDownload

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..d76a3dc6f8 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c628a69c57..c35914511f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> | <literal>false</literal> }
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -268,7 +268,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if set
+      to <literal>match</literal>. If the number of columns in the header is
+      not correct, their order differs from the one expected, or the name or
+      case do not match, the copy will be aborted with an error.
       This option is allowed only when using <literal>CSV</literal> or
       <literal>text</literal> format.
      </para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a21508a974..cde6582f1a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,16 @@ typedef enum CopyInsertMethod
 	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
 } CopyInsertMethod;
 
+/*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1182,7 +1192,28 @@ ProcessCopyOptions(ParseState *pstate,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
-			cstate->header_line = defGetBoolean(defel);
+
+			PG_TRY();
+			{
+				if (defGetBoolean(defel))
+					cstate->header_line = COPY_HEADER_PRESENT;
+				else
+					cstate->header_line = COPY_HEADER_ABSENT;
+			}
+			PG_CATCH();
+			{
+				if (!cstate->is_copy_from)
+					PG_RE_THROW();
+
+				char	   *sval = defGetString(defel);
+				if (pg_strcasecmp(sval, "match") == 0)
+					cstate->header_line = COPY_HEADER_MATCH;
+				else
+					ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("header requires a boolean or \"match\"")));
+			}
+			PG_END_TRY();
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -2101,7 +2132,7 @@ CopyTo(CopyState cstate)
 
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, colname, false,
-									list_length(cstate->attnumlist) == 1);
+										list_length(cstate->attnumlist) == 1);
 				else
 					CopyAttributeOutText(cstate, colname);
 			}
@@ -3599,12 +3630,53 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->binary);
 
-	/* on input just throw the header line away */
+	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->header_line)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+				     	 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int				attnum = lfirst_int(cur);
+				char		  *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("wrong header for column \"%s\": got \"%s\"",
+								NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 2368649111..4d21c7d524 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -213,3 +213,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index c1f7f99747..b792181fe3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -173,3 +173,20 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

#45

Magnus Hagander

magnus@hagander.net

over 5 years ago

In reply to: Rémi Lapeyre (#42)

Re: Add header support to text format and matching feature

On Fri, Jul 17, 2020 at 5:11 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr>
wrote:

It's hard to find an explanation what this patch actually does. I don't
want to have to go through threads dating back 4 months to determine
what was discussed and what was actually implemented. Since you're
already using git format-patch, just add something to the commit message.

It appears that these are really two separate features, so perhaps they
should be two patches.

Thanks for the feedback, I've split cleanly the two patches, simplified the
tests and tried to explain the changes in the commit message.

Also, the new header matching mode could probably use more than one line
of documentation.

I've improved the documentation, let's me know if it's better.

It seems like cfbot is not happy with the way I'm sending my patches. The
wiki
has some good advices on how to write a patch but I couldn't find anything
on
how to send it. I've used

git send-email -v3 --compose --to=... --in-reply-to=... HEAD^^

here but I'm not sure if it's correct. I will see if it works and will try
to fix
it if it's not but since it runs once a day it may take some time.

If you have two patches that depend on each other, you should send them as
two attachment to the same email. You now sent them as two separate emails,
and cfbot will then pick up the latest one of them which is only patch 0002
(at least I'm fairly sure that's how it works).

I don't know how to do that with git-send-email, but you can certainly do
it easy with git-format-patch and just attach them using your regular MUA.

(and while the cfbot and the archives have no problems dealing with the
change in subject, it does break threading in some other MUAs, so I would
recommend not doing that and sticking to the existing subject of the thread)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#46

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: Magnus Hagander (#45)

2 attachment(s)

Re: Add header support to text format and matching feature

I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just attach them using your regular MUA.

(and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading in some other MUAs, so I would recommend not doing that and sticking to the existing subject of the thread)

Thanks, here are both patches attached so cfbot can read them.

Attachments:

v3-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v3-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From 99fa3b6d623105c5eea6fe14b2dc7287663fe8fb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v3 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.27.0"

This is a multi-part message in MIME format.
--------------2.27.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             | 11 +++++++----
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)


--------------2.27.0
Content-Type: text/x-patch; name="v3-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v3-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189abc6c..c628a69c57 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -269,7 +269,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c4cb..a21508a974 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -136,7 +136,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1363,10 +1363,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2099,8 +2099,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..2368649111 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..c1f7f99747 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.27.0--

v3-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v3-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 3e803c01593b6283a9deb7e4ddedca3d28650259 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 02:04:55 +0200
Subject: [PATCH v3 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.27.0"

This is a multi-part message in MIME format.
--------------2.27.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  6 ++
 contrib/file_fdw/output/file_fdw.source |  9 ++-
 doc/src/sgml/ref/copy.sgml              |  8 ++-
 src/backend/commands/copy.c             | 84 +++++++++++++++++++++++--
 src/test/regress/input/copy.source      | 25 ++++++++
 src/test/regress/output/copy.source     | 17 +++++
 6 files changed, 140 insertions(+), 9 deletions(-)


--------------2.27.0
Content-Type: text/x-patch; name="v3-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v3-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..d76a3dc6f8 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c628a69c57..c35914511f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> | <literal>false</literal> }
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -268,7 +268,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if set
+      to <literal>match</literal>. If the number of columns in the header is
+      not correct, their order differs from the one expected, or the name or
+      case do not match, the copy will be aborted with an error.
       This option is allowed only when using <literal>CSV</literal> or
       <literal>text</literal> format.
      </para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a21508a974..cde6582f1a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,16 @@ typedef enum CopyInsertMethod
 	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
 } CopyInsertMethod;
 
+/*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1182,7 +1192,28 @@ ProcessCopyOptions(ParseState *pstate,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
-			cstate->header_line = defGetBoolean(defel);
+
+			PG_TRY();
+			{
+				if (defGetBoolean(defel))
+					cstate->header_line = COPY_HEADER_PRESENT;
+				else
+					cstate->header_line = COPY_HEADER_ABSENT;
+			}
+			PG_CATCH();
+			{
+				if (!cstate->is_copy_from)
+					PG_RE_THROW();
+
+				char	   *sval = defGetString(defel);
+				if (pg_strcasecmp(sval, "match") == 0)
+					cstate->header_line = COPY_HEADER_MATCH;
+				else
+					ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("header requires a boolean or \"match\"")));
+			}
+			PG_END_TRY();
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -2101,7 +2132,7 @@ CopyTo(CopyState cstate)
 
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, colname, false,
-									list_length(cstate->attnumlist) == 1);
+										list_length(cstate->attnumlist) == 1);
 				else
 					CopyAttributeOutText(cstate, colname);
 			}
@@ -3599,12 +3630,53 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->binary);
 
-	/* on input just throw the header line away */
+	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->header_line)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+				     	 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int				attnum = lfirst_int(cur);
+				char		  *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("wrong header for column \"%s\": got \"%s\"",
+								NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 2368649111..4d21c7d524 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -213,3 +213,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index c1f7f99747..b792181fe3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -173,3 +173,20 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.27.0--

#47

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: Rémi Lapeyre (#46)

Re: Add header support to text format and matching feature

On Fri, Jul 17, 2020 at 10:18 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:

I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just attach them using your regular MUA.

(and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading in some other MUAs, so I would recommend not doing that and sticking to the existing subject of the thread)

Thanks, here are both patches attached so cfbot can read them.

Few comments:
Few tests are failing because of hardcoded path:
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename
'/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv',
delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename
'/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv',
delimiter ',', header 'match');  -- ERROR
Generally path is not specified like this. file_fdw test of make
check-world is failing because of this.

There is one warning present in the changes:
copy.c: In function ‘ProcessCopyOptions’:
copy.c:1208:5: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
char *sval = defGetString(defel);

There is space before tab in indent in the below code, check for git
diff --check.
+                       if (fldct < list_length(cstate->attnumlist))
+                               ereport(ERROR,
+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                                        errmsg("missing header")));

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

#48

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: vignesh C (#47)

2 attachment(s)

Re: Add header support to text format and matching feature

Thanks for the feedback,

There is one warning present in the changes:
copy.c: In function ‘ProcessCopyOptions’:
copy.c:1208:5: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
char *sval = defGetString(defel);

Weirdly this is not caught by clang, even with -Wdeclaration-after-statement.

Here’s a new version that fix all the issues.

Rémi

Attachments:

v4-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v4-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From 99fa3b6d623105c5eea6fe14b2dc7287663fe8fb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v4 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.27.0"

This is a multi-part message in MIME format.
--------------2.27.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             | 11 +++++++----
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)


--------------2.27.0
Content-Type: text/x-patch; name="v4-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v4-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189abc6c..c628a69c57 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -269,7 +269,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c4cb..a21508a974 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -136,7 +136,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1363,10 +1363,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2099,8 +2099,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..2368649111 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..c1f7f99747 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.27.0--

v4-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v4-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 9f931f133e4d92906c15ff5dc257384c673d9159 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 02:04:55 +0200
Subject: [PATCH v4 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.27.0"

This is a multi-part message in MIME format.
--------------2.27.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  6 ++
 contrib/file_fdw/output/file_fdw.source |  9 ++-
 doc/src/sgml/ref/copy.sgml              |  8 ++-
 src/backend/commands/copy.c             | 85 +++++++++++++++++++++++--
 src/test/regress/input/copy.source      | 25 ++++++++
 src/test/regress/output/copy.source     | 17 +++++
 6 files changed, 141 insertions(+), 9 deletions(-)


--------------2.27.0
Content-Type: text/x-patch; name="v4-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v4-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..ebe826b9f4 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c628a69c57..c35914511f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> | <literal>false</literal> }
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -268,7 +268,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if set
+      to <literal>match</literal>. If the number of columns in the header is
+      not correct, their order differs from the one expected, or the name or
+      case do not match, the copy will be aborted with an error.
       This option is allowed only when using <literal>CSV</literal> or
       <literal>text</literal> format.
      </para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a21508a974..d58fa0cdd8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,16 @@ typedef enum CopyInsertMethod
 	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
 } CopyInsertMethod;
 
+/*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1182,7 +1192,29 @@ ProcessCopyOptions(ParseState *pstate,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
-			cstate->header_line = defGetBoolean(defel);
+
+			PG_TRY();
+			{
+				if (defGetBoolean(defel))
+					cstate->header_line = COPY_HEADER_PRESENT;
+				else
+					cstate->header_line = COPY_HEADER_ABSENT;
+			}
+			PG_CATCH();
+			{
+				char	*sval = defGetString(defel);
+
+				if (!cstate->is_copy_from)
+					PG_RE_THROW();
+
+				if (pg_strcasecmp(sval, "match") == 0)
+					cstate->header_line = COPY_HEADER_MATCH;
+				else
+					ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("header requires a boolean or \"match\"")));
+			}
+			PG_END_TRY();
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -2101,7 +2133,7 @@ CopyTo(CopyState cstate)
 
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, colname, false,
-									list_length(cstate->attnumlist) == 1);
+										list_length(cstate->attnumlist) == 1);
 				else
 					CopyAttributeOutText(cstate, colname);
 			}
@@ -3599,12 +3631,53 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->binary);
 
-	/* on input just throw the header line away */
+	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->header_line)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int				attnum = lfirst_int(cur);
+				char		  *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("wrong header for column \"%s\": got \"%s\"",
+								NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 2368649111..4d21c7d524 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -213,3 +213,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index c1f7f99747..b792181fe3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -173,3 +173,20 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.27.0--

#49

daniel@manitou-mail.org

over 5 years ago

In reply to: Rémi Lapeyre (#48)

Re: Add header support to text format and matching feature

Rémi Lapeyre wrote:

Here’s a new version that fix all the issues.

Here's a review of v4.

The patch improves COPY in two ways:

- COPY TO and COPY FROM now accept "HEADER ON" for the TEXT format
(previously it was only for CSV)

- COPY FROM also accepts "HEADER match" to tell that there's a header
and that its contents must match the columns of the destination table.
This works for both the CSV and TEXT formats. The syntax for the
columns is the same as for the data and the match is case-sensitive.

The first improvement when submitted alone (in 2018 by Simon Muller)
has been judged not useful enough or even hazardous without any
"match" feature. It was returned with feedback in 2018-10 and
resubmitted by Rémi in 2020-02 with the match feature.

The patches apply cleanly, "make check" and "make check-world" pass.

In my tests it works fine except for one crash that I can reproduce
on a fresh build and default configuration with:

$ cat >file.txt
i
1

$ psql
postgres=# create table x(i int);
CREATE TABLE
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
PANIC: ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I suspect the reason is the way that PG_TRY/PG_CATCH is used, see below.

Code comments:

+/*
+ * Represents whether the header must be absent, present or present and
match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of
COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
	bool		binary; 		/* binary format? */
	bool		freeze; 		/* freeze rows on loading? */
	bool		csv_mode;		/* Comma Separated Value
format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */

After the redefinition into this enum type, there are still a
bunch of references to header_line that treat it like a boolean:

1190: if (cstate->header_line)
1398: if (cstate->binary && cstate->header_line)
2119: if (cstate->header_line)
3635: if (cstate->cur_lineno == 0 && cstate->header_line)

It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
but maybe it's not good style to count on that.

+			PG_TRY();
+			{
+				if (defGetBoolean(defel))
+					cstate->header_line =
COPY_HEADER_PRESENT;
+				else
+					cstate->header_line =
COPY_HEADER_ABSENT;
+			}
+			PG_CATCH();
+			{
+				char	*sval = defGetString(defel);
+
+				if (!cstate->is_copy_from)
+					PG_RE_THROW();
+
+				if (pg_strcasecmp(sval, "match") == 0)
+					cstate->header_line =
COPY_HEADER_MATCH;
+				else
+					ereport(ERROR,
+					       
(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("header requires a
boolean or \"match\"")));
+			}
+			PG_END_TRY();

It seems wrong to use a PG_CATCH block for this. I understand that
it's because defGetBoolean() calls ereport() on non-booleans, but then
it should be split into an error-throwing function and a
non-error-throwing lexical analysis of the boolean, the above code
calling the latter.
Besides the comments in elog.h above PG_TRY say that
"the error recovery code
can either do PG_RE_THROW to propagate the error outwards, or do a
(sub)transaction abort. Failure to do so may leave the system in an
inconsistent state for further processing."
Maybe this is what happens with the repeated uses of "match"
eventually failing with ERRORDATA_STACK_SIZE exceeded.

-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> |
<literal>false</literal> }

This should be enclosed in square brackets because HEADER
with no argument is still accepted.

+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if
set

The elision of "header" as the subject might be misinterpreted as if
it's the first line that is true. I'd suggest
"when <literal>header>/literal> is set to ..." to avoid any confusion.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#50

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: Daniel Verite (#49)

2 attachment(s)

Re: Add header support to text format and matching feature

Thanks for your comments, Please find my thoughts inline.

In my tests it works fine except for one crash that I can reproduce
on a fresh build and default configuration with:

$ cat >file.txt
i
1

$ psql
postgres=# create table x(i int);
CREATE TABLE
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
PANIC: ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.

Code comments:
+/*
+ * Represents whether the header must be absent, present or present and
match.
+ */
+typedef enum CopyHeader
+{
+       COPY_HEADER_ABSENT,
+       COPY_HEADER_PRESENT,
+       COPY_HEADER_MATCH
+} CopyHeader;
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of
COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
bool            binary;                 /* binary format? */
bool            freeze;                 /* freeze rows on loading? */
bool            csv_mode;               /* Comma Separated Value
format? */
-       bool            header_line;    /* CSV or text header line? */
+       CopyHeader  header_line;        /* CSV or text header line? */
After the redefinition into this enum type, there are still a
bunch of references to header_line that treat it like a boolean:

1190: if (cstate->header_line)
1398: if (cstate->binary && cstate->header_line)
2119: if (cstate->header_line)
3635: if (cstate->cur_lineno == 0 && cstate->header_line)

It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
but maybe it's not good style to count on that.

Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT.

+                       PG_TRY();
+                       {
+                               if (defGetBoolean(defel))
+                                       cstate->header_line =
COPY_HEADER_PRESENT;
+                               else
+                                       cstate->header_line =
COPY_HEADER_ABSENT;
+                       }
+                       PG_CATCH();
+                       {
+                               char    *sval = defGetString(defel);
+
+                               if (!cstate->is_copy_from)
+                                       PG_RE_THROW();
+
+                               if (pg_strcasecmp(sval, "match") == 0)
+                                       cstate->header_line =
COPY_HEADER_MATCH;
+                               else
+                                       ereport(ERROR,
+
(errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("header requires a
boolean or \"match\"")));
+                       }
+                       PG_END_TRY();
It seems wrong to use a PG_CATCH block for this. I understand that
it's because defGetBoolean() calls ereport() on non-booleans, but then
it should be split into an error-throwing function and a
non-error-throwing lexical analysis of the boolean, the above code
calling the latter.
Besides the comments in elog.h above PG_TRY say that
"the error recovery code
can either do PG_RE_THROW to propagate the error outwards, or do a
(sub)transaction abort. Failure to do so may leave the system in an
inconsistent state for further processing."
Maybe this is what happens with the repeated uses of "match"
eventually failing with ERRORDATA_STACK_SIZE exceeded.

Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.

-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> |
<literal>false</literal> }
This should be enclosed in square brackets because HEADER
with no argument is still accepted.

Fixed.

+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if
set
The elision of "header" as the subject might be misinterpreted as if
it's the first line that is true. I'd suggest
"when <literal>header>/literal> is set to ..." to avoid any confusion.

Fixed.

Attached v5 patch with the fixes of above comments.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0001-Add-header-support-to-COPY-TO-text-format.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Add-header-support-to-COPY-TO-text-format.patchDownload

From 847c0d64dfd4991d52b6c6e47abcd10e23d4bf8b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v5 1/2] Add header support to "COPY TO" text format

CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             | 11 +++++++----
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728e..83edb71 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f..547b81f 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..c628a69 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -269,7 +269,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..5d5ad43 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -136,7 +136,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1411,10 +1411,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2147,8 +2147,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..2368649 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c1f7f99 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
-- 
1.8.3.1

v5-0002-Add-header-matching-mode-to-COPY-FROM.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Add-header-matching-mode-to-COPY-FROM.patchDownload

From 00321a601847ad1c45f6e9fba4ae47bf64e1deb0 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Mon, 17 Aug 2020 17:48:09 +0530
Subject: [PATCH v5 2/2] Add header matching mode to "COPY FROM"

COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |   6 ++
 contrib/file_fdw/output/file_fdw.source |   9 ++-
 doc/src/sgml/ref/copy.sgml              |  12 ++-
 src/backend/commands/copy.c             | 128 +++++++++++++++++++++++++++++---
 src/test/regress/input/copy.source      |  25 +++++++
 src/test/regress/output/copy.source     |  17 +++++
 6 files changed, 183 insertions(+), 14 deletions(-)

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71..7a3983c 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81f..ebe826b 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c628a69..cb8232d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -268,9 +268,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case do not match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5d5ad43..0625090 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,6 +96,16 @@ typedef enum CopyInsertMethod
 } CopyInsertMethod;
 
 /*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
+/*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
  * even though some fields are used in only some cases.
@@ -136,7 +146,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1136,6 +1146,64 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 }
 
 /*
+ * Extract a CopyHeader value from a DefElem.
+ */
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	 * If no parameter given, assume "true" is meant.
+	 */
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	 * Allow 0, 1, "true", "false", "on", "off" or "match".
+	 */
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	   *sval = defGetString(def);
+
+				/*
+				 * The set of strings accepted here should match up with the
+				 * grammar's opt_boolean_or_string production.
+				 */
+				if (pg_strcasecmp(sval, "true") == 0)
+					return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+					return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+					return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+					return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+					return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+	ereport(ERROR,
+			(errcode(ERRCODE_SYNTAX_ERROR),
+			 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;				/* keep compiler quiet */
+}
+
+/*
  * Process the statement option list for COPY.
  *
  * Scan the options list (a list of DefElem) and transpose the information
@@ -1230,7 +1298,8 @@ ProcessCopyOptions(ParseState *pstate,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
-			cstate->header_line = defGetBoolean(defel);
+
+			cstate->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -1411,7 +1480,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (cstate->binary && cstate->header_line)
+	if (cstate->binary && cstate->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
@@ -2132,7 +2201,7 @@ CopyTo(CopyState cstate)
 														 cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->header_line)
+		if (cstate->header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
@@ -2149,7 +2218,7 @@ CopyTo(CopyState cstate)
 
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, colname, false,
-									list_length(cstate->attnumlist) == 1);
+										list_length(cstate->attnumlist) == 1);
 				else
 					CopyAttributeOutText(cstate, colname);
 			}
@@ -3647,12 +3716,53 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int				attnum = lfirst_int(cur);
+				char		  *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("wrong header for column \"%s\": got \"%s\"",
+								NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 2368649..4d21c7d 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -213,3 +213,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index c1f7f99..b792181 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -173,3 +173,20 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;
-- 
1.8.3.1

#51

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: vignesh C (#50)

Re: Add header support to text format and matching feature

Thanks Daniel for the review and Vignesh for addressing the comments.

I have two remarks with the state of the current patches:
- DefGetCopyHeader() duplicates a lot of code from defGetBoolean(), should we refactor this so that they can share more of their internals? In the current implementation any change to defGetBoolean() should be made to DefGetCopyHeader() too or their behaviour will subtly differ.
- It is possible to set the header option multiple time:
\copy x(i) from file.txt with (format csv, header off, header on);
In which case the last one is the one kept. I think this is a bug and it should be fixed, but this is already the behaviour in the current implementation so fixing it would not be backward compatible. Do you think users should not do this and I can fix it or that keeping the current behaviour is better for backward compatibility?

Regards,
Rémi

Show quoted text

Le 17 août 2020 à 14:49, vignesh C <vignesh21@gmail.com> a écrit :

Thanks for your comments, Please find my thoughts inline.

In my tests it works fine except for one crash that I can reproduce
on a fresh build and default configuration with:

$ cat >file.txt
i
1

$ psql
postgres=# create table x(i int);
CREATE TABLE
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
PANIC: ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
Code comments:
+/*
+ * Represents whether the header must be absent, present or present and
match.
+ */
+typedef enum CopyHeader
+{
+       COPY_HEADER_ABSENT,
+       COPY_HEADER_PRESENT,
+       COPY_HEADER_MATCH
+} CopyHeader;
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of
COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
bool            binary;                 /* binary format? */
bool            freeze;                 /* freeze rows on loading? */
bool            csv_mode;               /* Comma Separated Value
format? */
-       bool            header_line;    /* CSV or text header line? */
+       CopyHeader  header_line;        /* CSV or text header line? */
After the redefinition into this enum type, there are still a
bunch of references to header_line that treat it like a boolean:

1190: if (cstate->header_line)
1398: if (cstate->binary && cstate->header_line)
2119: if (cstate->header_line)
3635: if (cstate->cur_lineno == 0 && cstate->header_line)

It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
but maybe it's not good style to count on that.
Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT.
+                       PG_TRY();
+                       {
+                               if (defGetBoolean(defel))
+                                       cstate->header_line =
COPY_HEADER_PRESENT;
+                               else
+                                       cstate->header_line =
COPY_HEADER_ABSENT;
+                       }
+                       PG_CATCH();
+                       {
+                               char    *sval = defGetString(defel);
+
+                               if (!cstate->is_copy_from)
+                                       PG_RE_THROW();
+
+                               if (pg_strcasecmp(sval, "match") == 0)
+                                       cstate->header_line =
COPY_HEADER_MATCH;
+                               else
+                                       ereport(ERROR,
+
(errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("header requires a
boolean or \"match\"")));
+                       }
+                       PG_END_TRY();
It seems wrong to use a PG_CATCH block for this. I understand that
it's because defGetBoolean() calls ereport() on non-booleans, but then
it should be split into an error-throwing function and a
non-error-throwing lexical analysis of the boolean, the above code
calling the latter.
Besides the comments in elog.h above PG_TRY say that
"the error recovery code
can either do PG_RE_THROW to propagate the error outwards, or do a
(sub)transaction abort. Failure to do so may leave the system in an
inconsistent state for further processing."
Maybe this is what happens with the repeated uses of "match"
eventually failing with ERRORDATA_STACK_SIZE exceeded.
Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> |
<literal>false</literal> }
This should be enclosed in square brackets because HEADER
with no argument is still accepted.
Fixed.
+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if
set
The elision of "header" as the subject might be misinterpreted as if
it's the first line that is true. I'd suggest
"when <literal>header>/literal> is set to ..." to avoid any confusion.
Fixed.

Attached v5 patch with the fixes of above comments.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
<v5-0001-Add-header-support-to-COPY-TO-text-format.patch><v5-0002-Add-header-matching-mode-to-COPY-FROM.patch>

#52

michael@paquier.xyz

over 5 years ago

In reply to: Rémi Lapeyre (#51)

Re: Add header support to text format and matching feature

On Thu, Aug 27, 2020 at 04:53:11PM +0200, Rémi Lapeyre wrote:

I have two remarks with the state of the current patches:
- DefGetCopyHeader() duplicates a lot of code from defGetBoolean(),
should we refactor this so that they can share more of their
internals? In the current implementation any change to
defGetBoolean() should be made to DefGetCopyHeader() too or their
behaviour will subtly differ.

The difference comes from the use of "match", and my take would be
here that it is wrong to assume that header can be a boolean-like
parameter with only one exception. It seems to me that we may
actually be looking at having this stuff as an option different than
"header" at the end to have clear semantics.

- It is possible to set the header option multiple time:
\copy x(i) from file.txt with (format csv, header off, header on);
In which case the last one is the one kept. I think this is a bug
and it should be fixed, but this is already the behaviour in the
current implementation so fixing it would not be backward
compatible. Do you think users should not do this and I can fix it
or that keeping the current behaviour is better for backward
compatibility?

I would agree that this is a bug because we are failing to detect
what's actually a redundant option here as the first option still
causes the flag to be set to false, but that's not something worth a
back-patch IMO. What we are looking here is something similar
to what is done with "format", where we track if the option has been
specified with format_specified. The same is actually true with the
"freeze" option here, and it is true that we tend to prefer error-ing
in such cases while there are exceptions like EXPLAIN. I think that
it would be nicer to be at least consistent with the behavior that
each command has chosen, and COPY is now a mixed bag.

I have marked the patch as returned with feedback for now.
--
Michael

#53

remi.lapeyre@lenstra.fr

over 5 years ago

In reply to: Michael Paquier (#52)

3 attachment(s)

Re: Add header support to text format and matching feature

I would agree that this is a bug because we are failing to detect
what's actually a redundant option here as the first option still
causes the flag to be set to false, but that's not something worth a
back-patch IMO. What we are looking here is something similar
to what is done with "format", where we track if the option has been
specified with format_specified. The same is actually true with the
"freeze" option here, and it is true that we tend to prefer error-ing
in such cases while there are exceptions like EXPLAIN. I think that
it would be nicer to be at least consistent with the behavior that
each command has chosen, and COPY is now a mixed bag.

Here’s a new version of the patches that report an error when the options are set multiple time.

Regards,
Rémi

Attachments:

v5-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v5-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From ae87e1cc7b71ae3c742337dbb5ca2960bc2b2b39 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v5 1/3] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.28.0"

This is a multi-part message in MIME format.
--------------2.28.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             | 11 +++++++----
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)


--------------2.28.0
Content-Type: text/x-patch; name="v5-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v5-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 369342b74d..fcab594f09 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -271,7 +271,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2047557e52..414940e880 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -136,7 +136,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1411,10 +1411,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2147,8 +2147,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..2368649111 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..c1f7f99747 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.28.0--

v5-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v5-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 2618b9f83edbc3c6266a8cf6eddd7cee3589b958 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Mon, 17 Aug 2020 17:48:09 +0530
Subject: [PATCH v5 2/3] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.28.0"

This is a multi-part message in MIME format.
--------------2.28.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |   6 ++
 contrib/file_fdw/output/file_fdw.source |   9 +-
 doc/src/sgml/ref/copy.sgml              |  12 ++-
 src/backend/commands/copy.c             | 128 ++++++++++++++++++++++--
 src/test/regress/input/copy.source      |  25 +++++
 src/test/regress/output/copy.source     |  17 ++++
 6 files changed, 183 insertions(+), 14 deletions(-)


--------------2.28.0
Content-Type: text/x-patch; name="v5-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v5-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..ebe826b9f4 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fcab594f09..a804d0c35b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -270,9 +270,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case do not match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 414940e880..97e8514b51 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,16 @@ typedef enum CopyInsertMethod
 	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
 } CopyInsertMethod;
 
+/*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1135,6 +1145,64 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		table_close(rel, NoLock);
 }
 
+/*
+ * Extract a CopyHeader value from a DefElem.
+ */
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	 * If no parameter given, assume "true" is meant.
+	 */
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	 * Allow 0, 1, "true", "false", "on", "off" or "match".
+	 */
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	   *sval = defGetString(def);
+
+				/*
+				 * The set of strings accepted here should match up with the
+				 * grammar's opt_boolean_or_string production.
+				 */
+				if (pg_strcasecmp(sval, "true") == 0)
+					return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+					return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+					return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+					return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+					return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+	ereport(ERROR,
+			(errcode(ERRCODE_SYNTAX_ERROR),
+			 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;				/* keep compiler quiet */
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -1230,7 +1298,8 @@ ProcessCopyOptions(ParseState *pstate,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
-			cstate->header_line = defGetBoolean(defel);
+
+			cstate->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -1411,7 +1480,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (cstate->binary && cstate->header_line)
+	if (cstate->binary && cstate->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
@@ -2132,7 +2201,7 @@ CopyTo(CopyState cstate)
 														 cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->header_line)
+		if (cstate->header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
@@ -2149,7 +2218,7 @@ CopyTo(CopyState cstate)
 
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, colname, false,
-									list_length(cstate->attnumlist) == 1);
+										list_length(cstate->attnumlist) == 1);
 				else
 					CopyAttributeOutText(cstate, colname);
 			}
@@ -3647,12 +3716,53 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int				attnum = lfirst_int(cur);
+				char		  *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("wrong header for column \"%s\": got \"%s\"",
+								NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 2368649111..4d21c7d524 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -213,3 +213,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index c1f7f99747..b792181fe3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -173,3 +173,20 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.28.0--

v5-0003-Report-an-error-when-options-are-set-multiple-tim.patchapplication/octet-stream; name=v5-0003-Report-an-error-when-options-are-set-multiple-tim.patch; x-unix-mode=0644Download

From 9e6fd40a83f24bdc9d23f95af21e8c8bed57303e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Sat, 3 Oct 2020 23:13:44 +0200
Subject: [PATCH v5 3/3] Report an error when options are set multiple times in
 COPY
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.28.0"

This is a multi-part message in MIME format.
--------------2.28.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 src/backend/commands/copy.c         |  9 +++++++--
 src/test/regress/input/copy.source  |  6 ++++++
 src/test/regress/output/copy.source | 10 ++++++++++
 3 files changed, 23 insertions(+), 2 deletions(-)


--------------2.28.0
Content-Type: text/x-patch; name="v5-0003-Report-an-error-when-options-are-set-multiple-tim.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v5-0003-Report-an-error-when-options-are-set-multiple-tim.patch"

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 97e8514b51..dd8c43da28 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1227,6 +1227,8 @@ ProcessCopyOptions(ParseState *pstate,
 				   List *options)
 {
 	bool		format_specified = false;
+	bool		header_specified = false;
+	bool		freeze_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -1266,11 +1268,13 @@ ProcessCopyOptions(ParseState *pstate,
 		}
 		else if (strcmp(defel->defname, "freeze") == 0)
 		{
-			if (cstate->freeze)
+			if (freeze_specified)
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
+
+			freeze_specified = true;
 			cstate->freeze = defGetBoolean(defel);
 		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
@@ -1293,12 +1297,13 @@ ProcessCopyOptions(ParseState *pstate,
 		}
 		else if (strcmp(defel->defname, "header") == 0)
 		{
-			if (cstate->header_line)
+			if (header_specified)
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
 
+			header_specified = true;
 			cstate->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 4d21c7d524..bb6cea519f 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -146,6 +146,9 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest4 to stdout (header);
 
+-- specifying header multiple times should report an error
+copy copytest4 to stdout (header off, header on);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
@@ -182,6 +185,9 @@ group by tableoid order by tableoid::regclass::name;
 
 truncate parted_copytest;
 
+-- specifying freeze multiple times should report an error
+copy copytest4 to stdout (freeze off, freeze on);
+
 -- create before insert row trigger on parted_copytest_a2
 create function part_ins_func() returns trigger language plpgsql as $$
 begin
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index b792181fe3..b6652de074 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -103,6 +103,11 @@ copy copytest4 to stdout (header);
 c1	col with tabulation: \t
 1	a
 2	b
+-- specifying header multiple times should report an error
+copy copytest4 to stdout (header off, header on);
+ERROR:  conflicting or redundant options
+LINE 1: copy copytest4 to stdout (header off, header on);
+                                              ^
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
@@ -136,6 +141,11 @@ group by tableoid order by tableoid::regclass::name;
 (2 rows)
 
 truncate parted_copytest;
+-- specifying freeze multiple times should report an error
+copy copytest4 to stdout (freeze off, freeze on);
+ERROR:  conflicting or redundant options
+LINE 1: copy copytest4 to stdout (freeze off, freeze on);
+                                              ^
 -- create before insert row trigger on parted_copytest_a2
 create function part_ins_func() returns trigger language plpgsql as $$
 begin

--------------2.28.0--

#54

michael@paquier.xyz

over 5 years ago

In reply to: Rémi Lapeyre (#53)

Re: Add header support to text format and matching feature

On Sat, Oct 03, 2020 at 11:42:52PM +0200, Rémi Lapeyre wrote:

Here’s a new version of the patches that report an error when the options are set multiple time.

Please note that I have applied a fix for the redundant option
handling as of 10c5291, though I have missed that you sent a patch.
Sorry about that. Looking at it, we have done the same thing
byte-by-byte except that I have added tests for all option
combinations.
--
Michael

#55

remi.lapeyre@lenstra.fr

about 5 years ago

In reply to: Michael Paquier (#54)

2 attachment(s)

Re: Add header support to text format and matching feature

Thanks Michael for taking care of that!

Here’s the rebased patches with the last one dropped.

Regards,
Rémi

Attachments:

v6-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v6-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From 2686dc9e9b0e585355006c5aedcf5b2b0289b06d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v6 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.28.0"

This is a multi-part message in MIME format.
--------------2.28.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             | 11 +++++++----
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)


--------------2.28.0
Content-Type: text/x-patch; name="v6-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v6-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 369342b74d..fcab594f09 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -271,7 +271,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3c7dbad27a..8b83decc55 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -136,7 +136,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1415,10 +1415,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (!cstate->csv_mode && cstate->header_line)
+	if (cstate->binary && cstate->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!cstate->csv_mode && cstate->quote != NULL)
@@ -2151,8 +2151,11 @@ CopyTo(CopyState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..2368649111 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..c1f7f99747 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.28.0--

v6-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v6-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 86e7d97b8075e4c41f3ed242ed37ff81c8bd7f18 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Tue, 13 Oct 2020 14:45:56 +0200
Subject: [PATCH v6 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.28.0"

This is a multi-part message in MIME format.
--------------2.28.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |   6 ++
 contrib/file_fdw/output/file_fdw.source |   9 +-
 doc/src/sgml/ref/copy.sgml              |  12 ++-
 src/backend/commands/copy.c             | 128 ++++++++++++++++++++++--
 src/test/regress/input/copy.source      |  25 +++++
 src/test/regress/output/copy.source     |  17 ++++
 6 files changed, 183 insertions(+), 14 deletions(-)


--------------2.28.0
Content-Type: text/x-patch; name="v6-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v6-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..ebe826b9f4 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fcab594f09..a804d0c35b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -270,9 +270,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case do not match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8b83decc55..999f3bd3f4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,16 @@ typedef enum CopyInsertMethod
 	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
 } CopyInsertMethod;
 
+/*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader  header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
@@ -1135,6 +1145,64 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		table_close(rel, NoLock);
 }
 
+/*
+ * Extract a CopyHeader value from a DefElem.
+ */
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	 * If no parameter given, assume "true" is meant.
+	 */
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	 * Allow 0, 1, "true", "false", "on", "off" or "match".
+	 */
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	   *sval = defGetString(def);
+
+				/*
+				 * The set of strings accepted here should match up with the
+				 * grammar's opt_boolean_or_string production.
+				 */
+				if (pg_strcasecmp(sval, "true") == 0)
+					return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+					return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+					return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+					return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+					return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+	ereport(ERROR,
+			(errcode(ERRCODE_SYNTAX_ERROR),
+			 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;				/* keep compiler quiet */
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -1233,8 +1301,9 @@ ProcessCopyOptions(ParseState *pstate,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
+
 			header_specified = true;
-			cstate->header_line = defGetBoolean(defel);
+			cstate->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -1415,7 +1484,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", cstate->delim)));
 
 	/* Check header */
-	if (cstate->binary && cstate->header_line)
+	if (cstate->binary && cstate->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
@@ -2136,7 +2205,7 @@ CopyTo(CopyState cstate)
 														 cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->header_line)
+		if (cstate->header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
@@ -2153,7 +2222,7 @@ CopyTo(CopyState cstate)
 
 				if (cstate->csv_mode)
 					CopyAttributeOutCSV(cstate, colname, false,
-									list_length(cstate->attnumlist) == 1);
+										list_length(cstate->attnumlist) == 1);
 				else
 					CopyAttributeOutText(cstate, colname);
 			}
@@ -3651,12 +3720,53 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int				attnum = lfirst_int(cur);
+				char		  *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("wrong header for column \"%s\": got \"%s\"",
+								NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 2368649111..4d21c7d524 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -213,3 +213,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index c1f7f99747..b792181fe3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -173,3 +173,20 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.28.0--

#56

remi.lapeyre@lenstra.fr

about 5 years ago

In reply to: Rémi Lapeyre (#55)

Re: Add header support to text format and matching feature

It looks like this is not in the current commitfest and that Cabot does not find it. I’m not yet accustomed to the PostgreSQL workflow, should I just create a new entry in the current commitfest?

Regards,
Rémi

Show quoted text

Le 13 oct. 2020 à 14:49, Rémi Lapeyre <remi.lapeyre@lenstra.fr> a écrit :

Thanks Michael for taking care of that!

Here’s the rebased patches with the last one dropped.

Regards,
Rémi

<v6-0001-Add-header-support-to-COPY-TO-text-format.patch><v6-0002-Add-header-matching-mode-to-COPY-FROM.patch>

Le 5 oct. 2020 à 03:05, Michael Paquier <michael@paquier.xyz> a écrit :

On Sat, Oct 03, 2020 at 11:42:52PM +0200, Rémi Lapeyre wrote:

Here’s a new version of the patches that report an error when the options are set multiple time.

Please note that I have applied a fix for the redundant option
handling as of 10c5291, though I have missed that you sent a patch.
Sorry about that. Looking at it, we have done the same thing
byte-by-byte except that I have added tests for all option
combinations.
--
Michael

#57

daniel@manitou-mail.org

about 5 years ago

In reply to: Rémi Lapeyre (#56)

Re: Add header support to text format and matching feature

Rémi Lapeyre wrote:

It looks like this is not in the current commitfest and that Cabot does not
find it. I’m not yet accustomed to the PostgreSQL workflow, should I just
create a new entry in the current commitfest?

Yes. Because in the last CommitFest it was marked
as "Returned with feedback"
https://commitfest.postgresql.org/29/2504/

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#58

remi.lapeyre@lenstra.fr

about 5 years ago

In reply to: Daniel Verite (#57)

2 attachment(s)

Re: Add header support to text format and matching feature

Hi, here’s a rebased version of the patch.

Best regards,
Rémi

Attachments:

v7-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v7-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From b919ac8aa8fffdf8b50728817fa9a3210887a1dd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v7 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.29.2"

This is a multi-part message in MIME format.
--------------2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             |  4 ++--
 src/backend/commands/copyto.c           |  5 ++++-
 src/include/commands/copy.h             |  2 +-
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 8 files changed, 30 insertions(+), 9 deletions(-)


--------------2.29.2
Content-Type: text/x-patch; name="v7-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v7-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 369342b74d..fcab594f09 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -271,7 +271,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6143b8bf2..36fa4c0c74 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -591,10 +591,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (!opts_out->csv_mode && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c7e5f04446..568b2ca46c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -912,8 +912,11 @@ CopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->opts.csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 127a3c61e2..d115f72559 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -32,7 +32,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..2368649111 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..c1f7f99747 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.29.2--

v7-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v7-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From ae8618091ba27a5e8200fe9207239cae3c4679ae Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Tue, 13 Oct 2020 14:45:56 +0200
Subject: [PATCH v7 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.29.2"

This is a multi-part message in MIME format.
--------------2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  6 +++
 contrib/file_fdw/output/file_fdw.source |  9 +++-
 doc/src/sgml/ref/copy.sgml              | 12 +++--
 src/backend/commands/copy.c             | 65 +++++++++++++++++++++++--
 src/backend/commands/copyfromparse.c    | 50 +++++++++++++++++--
 src/backend/commands/copyto.c           |  2 +-
 src/include/commands/copy.h             | 12 ++++-
 src/test/regress/input/copy.source      | 25 ++++++++++
 src/test/regress/output/copy.source     | 17 +++++++
 9 files changed, 184 insertions(+), 14 deletions(-)


--------------2.29.2
Content-Type: text/x-patch; name="v7-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v7-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..ebe826b9f4 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fcab594f09..a804d0c35b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -270,9 +270,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case do not match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 36fa4c0c74..0ffa9112fa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -315,7 +315,66 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 }
 
 /*
- * Process the statement option list for COPY.
+* Extract a CopyHeader value from a DefElem.
+*/
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	* If no parameter given, assume "true" is meant.
+	*/
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	* Allow 0, 1, "true", "false", "on", "off" or "match".
+	*/
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	*sval = defGetString(def);
+
+				/*
+				* The set of strings accepted here should match up with the
+				* grammar's opt_boolean_or_string production.
+				*/
+				if (pg_strcasecmp(sval, "true") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+						return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+
+	ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;						/* keep compiler quiet */
+}
+
+/*
+* Process the statement option list for COPY.
  *
  * Scan the options list (a list of DefElem) and transpose the information
  * into *opts_out, applying appropriate error checking.
@@ -410,7 +469,7 @@ ProcessCopyOptions(ParseState *pstate,
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
 			header_specified = true;
-			opts_out->header_line = defGetBoolean(defel);
+			opts_out->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -591,7 +650,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 34ed3cfcd5..4a76e5fdd5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -26,6 +26,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "port/pg_bswap.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -455,12 +456,53 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->opts.binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->opts.header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->opts.csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int                             attnum = lfirst_int(cur);
+				char              *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->opts.null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("wrong header for column \"%s\": got \"%s\"",
+									NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 568b2ca46c..3b441c408d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -897,7 +897,7 @@ CopyTo(CopyToState cstate)
 														 cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
+		if (cstate->opts.header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index d115f72559..b163bc8eb5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,6 +19,16 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
+/*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -32,7 +42,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader	header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 2368649111..4d21c7d524 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -213,3 +213,28 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index c1f7f99747..b792181fe3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -173,3 +173,20 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.29.2--

#59

David Steele

david@pgmasters.net

almost 5 years ago

In reply to: Rémi Lapeyre (#58)

Re: Add header support to text format and matching feature

On 12/7/20 6:40 PM, Rémi Lapeyre wrote:

Hi, here’s a rebased version of the patch.

Michael, since the issue of duplicated options has been fixed do either
of these patches look like they are ready for commit?

Regards,
--
-David
david@pgmasters.net

#60

remi.lapeyre@lenstra.fr

almost 5 years ago

In reply to: David Steele (#59)

2 attachment(s)

Re: Add header support to text format and matching feature

Michael, since the issue of duplicated options has been fixed do either of these patches look like they are ready for commit?

Here’s a rebased version of the patch.

Cheers,
Rémi

Show quoted text

Regards,
--
-David
david@pgmasters.net

Attachments:

v8-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v8-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From cbf52dfb114d197667b724534c8b69356ee2b347 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v8 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.31.1"

This is a multi-part message in MIME format.
--------------2.31.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             |  4 ++--
 src/backend/commands/copyto.c           |  5 ++++-
 src/include/commands/copy.h             |  2 +-
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 8 files changed, 30 insertions(+), 9 deletions(-)


--------------2.31.1
Content-Type: text/x-patch; name="v8-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v8-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 14cd437da0..e6e95c9b4c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -277,7 +277,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8265b981eb..1165f6b254 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -591,10 +591,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (!opts_out->csv_mode && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 7257a54e93..e9ae1c64a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -869,8 +869,11 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->opts.csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e33d..095d6f0b7e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -32,7 +32,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 8acb516801..379e47a541 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 25bdec6c60..ce8309ef56 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.31.1--

v8-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v8-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From dd561081af8a70b6d0c9eabc666a84355c68f726 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Tue, 13 Oct 2020 14:45:56 +0200
Subject: [PATCH v8 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.31.1"

This is a multi-part message in MIME format.
--------------2.31.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  6 +++
 contrib/file_fdw/output/file_fdw.source |  9 +++-
 doc/src/sgml/ref/copy.sgml              | 12 +++--
 src/backend/commands/copy.c             | 65 +++++++++++++++++++++++--
 src/backend/commands/copyfromparse.c    | 50 +++++++++++++++++--
 src/backend/commands/copyto.c           |  2 +-
 src/include/commands/copy.h             | 12 ++++-
 src/test/regress/input/copy.source      | 25 ++++++++++
 src/test/regress/output/copy.source     | 17 +++++++
 9 files changed, 184 insertions(+), 14 deletions(-)


--------------2.31.1
Content-Type: text/x-patch; name="v8-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v8-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..7a3983c785 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..ebe826b9f4 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_dont_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index e6e95c9b4c..d93b333f2c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -276,9 +276,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case do not match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1165f6b254..c13c89ddf6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -315,7 +315,66 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 }
 
 /*
- * Process the statement option list for COPY.
+* Extract a CopyHeader value from a DefElem.
+*/
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	* If no parameter given, assume "true" is meant.
+	*/
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	* Allow 0, 1, "true", "false", "on", "off" or "match".
+	*/
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	*sval = defGetString(def);
+
+				/*
+				* The set of strings accepted here should match up with the
+				* grammar's opt_boolean_or_string production.
+				*/
+				if (pg_strcasecmp(sval, "true") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+						return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+
+	ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;						/* keep compiler quiet */
+}
+
+/*
+* Process the statement option list for COPY.
  *
  * Scan the options list (a list of DefElem) and transpose the information
  * into *opts_out, applying appropriate error checking.
@@ -410,7 +469,7 @@ ProcessCopyOptions(ParseState *pstate,
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
 			header_specified = true;
-			opts_out->header_line = defGetBoolean(defel);
+			opts_out->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -591,7 +650,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0813424768..96d517d23b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -739,12 +740,53 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->opts.binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->opts.header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->opts.csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("missing header")));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected header")));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int                             attnum = lfirst_int(cur);
+				char              *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->opts.null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("wrong header for column \"%s\": got \"%s\"",
+									NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e9ae1c64a2..ed0191557c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -854,7 +854,7 @@ DoCopyTo(CopyToState cstate)
 														 cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
+		if (cstate->opts.header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 095d6f0b7e..eeceac6eef 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,6 +19,16 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
+/*
+ * Represents whether the header must be absent, present or present and match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -32,7 +42,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader	header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 379e47a541..d9b4f64022 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -275,3 +275,28 @@ copy tab_progress_reporting from '@abs_srcdir@/data/emp.data'
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index ce8309ef56..e21d24601e 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -227,3 +227,20 @@ INFO:  progress: {"type": "FILE", "command": "COPY FROM", "relname": "tab_progre
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  missing header
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.31.1--

#61

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Rémi Lapeyre (#60)

Re: Add header support to text format and matching feature

On Sat, Apr 10, 2021 at 4:17 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr>
wrote:

Michael, since the issue of duplicated options has been fixed do either

of these patches look like they are ready for commit?

Here’s a rebased version of the patch.

Cheers,
Rémi

Regards,
--
-David
david@pgmasters.net

Hi,

sure it matches what is expected and exit immediatly if it does not.

Typo: immediately

+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server

nit: since header is singular, you can name the table header_doesnt_match

+ from the one expected, or the name or case do not match, the copy
will

For 'the name or case do not match', either use plural for the subjects or
change 'do' to doesn't

-           opts_out->header_line = defGetBoolean(defel);
+           opts_out->header_line = DefGetCopyHeader(defel);

Existing method starts with lower case d, I wonder why the new method
starts with upper case D.

+           if (fldct < list_length(cstate->attnumlist))
+               ereport(ERROR,
+                       (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                        errmsg("missing header")));

The message seems to be inaccurate: the header may be there - it just
misses some fields.

+ * Represents whether the header must be absent, present or present and
match.

present and match: it seems present is redundant - if header is absent, how
can it match ?

Cheers

#62

remi.lapeyre@lenstra.fr

almost 5 years ago

In reply to: Zhihong Yu (#61)

2 attachment(s)

Re: Add header support to text format and matching feature

Hi,

sure it matches what is expected and exit immediatly if it does not.

Typo: immediately

+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server

nit: since header is singular, you can name the table header_doesnt_match

+ from the one expected, or the name or case do not match, the copy will

For 'the name or case do not match', either use plural for the subjects or change 'do' to doesn't

Thanks, I fixed both typos.

-           opts_out->header_line = defGetBoolean(defel);
+           opts_out->header_line = DefGetCopyHeader(defel);
Existing method starts with lower case d, I wonder why the new method starts with upper case D.

I don’t remember why I used DefGetCopyHeader, should I change it?

+           if (fldct < list_length(cstate->attnumlist))
+               ereport(ERROR,
+                       (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                        errmsg("missing header")));
The message seems to be inaccurate: the header may be there - it just misses some fields.

I changed the error messages, they now are:
ERROR: incomplete header, expected 3 columns but got 2
ERROR: extra data after last expected header, expected 3 columns but got 4

+ * Represents whether the header must be absent, present or present and match.

present and match: it seems present is redundant - if header is absent, how can it match ?

This now reads "Represents whether the header must be absent, present or match.”.

Cheers,
Rémi

Show quoted text

Cheers

Attachments:

v9-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v9-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From cbf52dfb114d197667b724534c8b69356ee2b347 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v9 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.31.1"

This is a multi-part message in MIME format.
--------------2.31.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             |  4 ++--
 src/backend/commands/copyto.c           |  5 ++++-
 src/include/commands/copy.h             |  2 +-
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 8 files changed, 30 insertions(+), 9 deletions(-)


--------------2.31.1
Content-Type: text/x-patch; name="v9-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v9-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 14cd437da0..e6e95c9b4c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -277,7 +277,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8265b981eb..1165f6b254 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -591,10 +591,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (!opts_out->csv_mode && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 7257a54e93..e9ae1c64a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -869,8 +869,11 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->opts.csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e33d..095d6f0b7e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -32,7 +32,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 8acb516801..379e47a541 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 25bdec6c60..ce8309ef56 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.31.1--

v9-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v9-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 5ac6c0229666a53a7ad79c10b980f082314502e9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Tue, 13 Oct 2020 14:45:56 +0200
Subject: [PATCH v9 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.31.1"

This is a multi-part message in MIME format.
--------------2.31.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  6 +++
 contrib/file_fdw/output/file_fdw.source |  9 +++-
 doc/src/sgml/ref/copy.sgml              | 12 +++--
 src/backend/commands/copy.c             | 65 +++++++++++++++++++++++--
 src/backend/commands/copyfromparse.c    | 52 ++++++++++++++++++--
 src/backend/commands/copyto.c           |  2 +-
 src/include/commands/copy.h             | 12 ++++-
 src/test/regress/input/copy.source      | 25 ++++++++++
 src/test/regress/output/copy.source     | 17 +++++++
 9 files changed, 186 insertions(+), 14 deletions(-)


--------------2.31.1
Content-Type: text/x-patch; name="v9-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v9-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..cb15cbb418 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..b40ce99a40 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_doesnt_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index e6e95c9b4c..739a25a6ce 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -276,9 +276,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case doesn't match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1165f6b254..c13c89ddf6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -315,7 +315,66 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 }
 
 /*
- * Process the statement option list for COPY.
+* Extract a CopyHeader value from a DefElem.
+*/
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	* If no parameter given, assume "true" is meant.
+	*/
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	* Allow 0, 1, "true", "false", "on", "off" or "match".
+	*/
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	*sval = defGetString(def);
+
+				/*
+				* The set of strings accepted here should match up with the
+				* grammar's opt_boolean_or_string production.
+				*/
+				if (pg_strcasecmp(sval, "true") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+						return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+
+	ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;						/* keep compiler quiet */
+}
+
+/*
+* Process the statement option list for COPY.
  *
  * Scan the options list (a list of DefElem) and transpose the information
  * into *opts_out, applying appropriate error checking.
@@ -410,7 +469,7 @@ ProcessCopyOptions(ParseState *pstate,
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
 			header_specified = true;
-			opts_out->header_line = defGetBoolean(defel);
+			opts_out->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -591,7 +650,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0813424768..d9ce0456f2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -739,12 +740,55 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->opts.binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->opts.header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->opts.csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("incomplete header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int                             attnum = lfirst_int(cur);
+				char              *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->opts.null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("wrong header for column \"%s\": got \"%s\"",
+									NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e9ae1c64a2..ed0191557c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -854,7 +854,7 @@ DoCopyTo(CopyToState cstate)
 														 cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
+		if (cstate->opts.header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 095d6f0b7e..6eabaff167 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,6 +19,16 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
+/*
+ * Represents whether the header must be absent, present or match.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -32,7 +42,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader	header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 379e47a541..d9b4f64022 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -275,3 +275,28 @@ copy tab_progress_reporting from '@abs_srcdir@/data/emp.data'
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index ce8309ef56..df2bea76a3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -227,3 +227,20 @@ INFO:  progress: {"type": "FILE", "command": "COPY FROM", "relname": "tab_progre
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  incomplete header, expected 3 columns but got 2
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header, expected 3 columns but got 4
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.31.1--

#63

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Rémi Lapeyre (#62)

Re: Add header support to text format and matching feature

On Sun, Apr 11, 2021 at 4:01 AM Rémi Lapeyre <remi.lapeyre@lenstra.fr>
wrote:

Hi,

sure it matches what is expected and exit immediatly if it does not.

Typo: immediately

+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER

file_server

nit: since header is singular, you can name the table header_doesnt_match

+ from the one expected, or the name or case do not match, the copy

will

For 'the name or case do not match', either use plural for the subjects

or change 'do' to doesn't

Thanks, I fixed both typos.
-           opts_out->header_line = defGetBoolean(defel);
+           opts_out->header_line = DefGetCopyHeader(defel);
Existing method starts with lower case d, I wonder why the new method
starts with upper case D.

I don’t remember why I used DefGetCopyHeader, should I change it?
+           if (fldct < list_length(cstate->attnumlist))
+               ereport(ERROR,
+                       (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                        errmsg("missing header")));
The message seems to be inaccurate: the header may be there - it just
misses some fields.

I changed the error messages, they now are:
ERROR: incomplete header, expected 3 columns but got 2
ERROR: extra data after last expected header, expected 3 columns but
got 4

+ * Represents whether the header must be absent, present or present and

match.

present and match: it seems present is redundant - if header is absent,

how can it match ?

This now reads "Represents whether the header must be absent, present or
match.”.

Cheers,
Rémi

Cheers

This now reads "Represents whether the header must be absent, present

or match.”.

Since match shouldn't be preceded with be, I think we can say:

Represents whether the header must match, be absent or be present.

Cheers

#64

remi.lapeyre@lenstra.fr

almost 5 years ago

In reply to: Zhihong Yu (#63)

2 attachment(s)

Re: Add header support to text format and matching feature

This now reads "Represents whether the header must be absent, present or match.”.

Since match shouldn't be preceded with be, I think we can say:

Represents whether the header must match, be absent or be present.

Thanks, here’s a v10 version of the patch that fixes this.

Attachments:

v10-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v10-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From cbf52dfb114d197667b724534c8b69356ee2b347 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v10 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.31.1"

This is a multi-part message in MIME format.
--------------2.31.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             |  4 ++--
 src/backend/commands/copyto.c           |  5 ++++-
 src/include/commands/copy.h             |  2 +-
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 8 files changed, 30 insertions(+), 9 deletions(-)


--------------2.31.1
Content-Type: text/x-patch; name="v10-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v10-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 45b728eeb3..83edb71077 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -37,7 +37,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 52b4d5f1df..547b81fd16 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -33,14 +33,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 14cd437da0..e6e95c9b4c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -277,7 +277,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8265b981eb..1165f6b254 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -591,10 +591,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (!opts_out->csv_mode && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 7257a54e93..e9ae1c64a2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -869,8 +869,11 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->opts.csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e33d..095d6f0b7e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -32,7 +32,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 8acb516801..379e47a541 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -134,6 +134,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 25bdec6c60..ce8309ef56 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -95,6 +95,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.31.1--

v10-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v10-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 47573825b3c39d5f0ce6037cb1c17b6344c8dc0a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Tue, 13 Oct 2020 14:45:56 +0200
Subject: [PATCH v10 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.31.1"

This is a multi-part message in MIME format.
--------------2.31.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  6 +++
 contrib/file_fdw/output/file_fdw.source |  9 +++-
 doc/src/sgml/ref/copy.sgml              | 12 +++--
 src/backend/commands/copy.c             | 65 +++++++++++++++++++++++--
 src/backend/commands/copyfromparse.c    | 52 ++++++++++++++++++--
 src/backend/commands/copyto.c           |  2 +-
 src/include/commands/copy.h             | 12 ++++-
 src/test/regress/input/copy.source      | 25 ++++++++++
 src/test/regress/output/copy.source     | 17 +++++++
 9 files changed, 186 insertions(+), 14 deletions(-)


--------------2.31.1
Content-Type: text/x-patch; name="v10-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v10-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 83edb71077..cb15cbb418 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -79,6 +79,12 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 547b81fd16..b40ce99a40 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -93,6 +93,11 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 CREATE FOREIGN TABLE text_csv (
     word1 text OPTIONS (force_not_null 'true'),
@@ -439,12 +444,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_doesnt_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index e6e95c9b4c..739a25a6ce 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -276,9 +276,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case doesn't match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1165f6b254..c13c89ddf6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -315,7 +315,66 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 }
 
 /*
- * Process the statement option list for COPY.
+* Extract a CopyHeader value from a DefElem.
+*/
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	* If no parameter given, assume "true" is meant.
+	*/
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	* Allow 0, 1, "true", "false", "on", "off" or "match".
+	*/
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	*sval = defGetString(def);
+
+				/*
+				* The set of strings accepted here should match up with the
+				* grammar's opt_boolean_or_string production.
+				*/
+				if (pg_strcasecmp(sval, "true") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+						return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+
+	ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;						/* keep compiler quiet */
+}
+
+/*
+* Process the statement option list for COPY.
  *
  * Scan the options list (a list of DefElem) and transpose the information
  * into *opts_out, applying appropriate error checking.
@@ -410,7 +469,7 @@ ProcessCopyOptions(ParseState *pstate,
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
 			header_specified = true;
-			opts_out->header_line = defGetBoolean(defel);
+			opts_out->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -591,7 +650,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0813424768..d9ce0456f2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -739,12 +740,55 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->opts.binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->opts.header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->opts.csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("incomplete header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int                             attnum = lfirst_int(cur);
+				char              *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->opts.null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("wrong header for column \"%s\": got \"%s\"",
+									NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e9ae1c64a2..ed0191557c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -854,7 +854,7 @@ DoCopyTo(CopyToState cstate)
 														 cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
+		if (cstate->opts.header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 095d6f0b7e..8075742664 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,6 +19,16 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
+/*
+ * Represents whether the header must match, be absent or be present.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -32,7 +42,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader	header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 379e47a541..d9b4f64022 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -275,3 +275,28 @@ copy tab_progress_reporting from '@abs_srcdir@/data/emp.data'
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index ce8309ef56..df2bea76a3 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -227,3 +227,20 @@ INFO:  progress: {"type": "FILE", "command": "COPY FROM", "relname": "tab_progre
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  incomplete header, expected 3 columns but got 2
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header, expected 3 columns but got 4
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;

--------------2.31.1--

#65

remi.lapeyre@lenstra.fr

about 4 years ago

In reply to: Rémi Lapeyre (#64)

2 attachment(s)

Re: Add header support to text format and matching feature

Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the same as v10 which was already marked as ready for committer.

Attachments:

v11-0001-Add-header-support-to-COPY-TO-text-format.patchapplication/octet-stream; name=v11-0001-Add-header-support-to-COPY-TO-text-format.patch; x-unix-mode=0644Download

From 41b00d9fcb50b5209295020cf1dc679234e1d018 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Fri, 17 Jul 2020 01:50:06 +0200
Subject: [PATCH v11 1/2] Add header support to "COPY TO" text format
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.34.1"

This is a multi-part message in MIME format.
--------------2.34.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/expected/file_fdw.out |  4 +---
 contrib/file_fdw/sql/file_fdw.sql      |  1 -
 doc/src/sgml/ref/copy.sgml             |  3 ++-
 src/backend/commands/copy.c            |  4 ++--
 src/backend/commands/copyto.c          |  5 ++++-
 src/include/commands/copy.h            |  2 +-
 src/test/regress/expected/copy.out     |  8 ++++++++
 src/test/regress/sql/copy.sql          | 12 ++++++++++++
 8 files changed, 30 insertions(+), 9 deletions(-)


--------------2.34.1
Content-Type: text/x-patch; name="v11-0001-Add-header-support-to-COPY-TO-text-format.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v11-0001-Add-header-support-to-COPY-TO-text-format.patch"

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 891146fef3..ba6fdba598 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -50,14 +50,12 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 ERROR:  COPY escape available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  COPY HEADER available only in CSV and text mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', quote ':');        -- ERROR
 ERROR:  COPY quote available only in CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', escape ':');       -- ERROR
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index 0ea8b14508..86f876d565 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -56,7 +56,6 @@ CREATE USER MAPPING FOR regress_no_priv_user SERVER file_server;
 
 -- validator tests
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml');  -- ERROR
-CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', header 'true');      -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':');          -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':');         -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', header 'true');    -- ERROR
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4624c8f4c9..2d3847e0d1 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -277,7 +277,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
       names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> format.
+      This option is allowed only when using <literal>CSV</literal> or
+      <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 53f4853141..382a12cfb2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -555,10 +555,10 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (!opts_out->csv_mode && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY HEADER available only in CSV mode")));
+				 errmsg("COPY HEADER available only in CSV and text mode")));
 
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b6eacd5baa..e9b962c7aa 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -863,8 +863,11 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				CopyAttributeOutCSV(cstate, colname, false,
+				if (cstate->opts.csv_mode)
+					CopyAttributeOutCSV(cstate, colname, false,
 									list_length(cstate->attnumlist) == 1);
+				else
+					CopyAttributeOutText(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 264895d278..c3f91cdc47 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -32,7 +32,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV header line? */
+	bool		header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 931e7b2e69..08de57ee69 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -120,6 +120,14 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+copy copytest4 from stdin (header);
+copy copytest4 to stdout (header);
+c1	col with tabulation: \t
+1	a
+2	b
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 15e26517ec..6809f43e26 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -160,6 +160,18 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+create temp table copytest4 (
+	c1 int,
+	"col with tabulation: 	" text);
+
+copy copytest4 from stdin (header);
+this is just a line full of junk that would error out if parsed
+1	a
+2	b
+\.
+
+copy copytest4 to stdout (header);
+
 -- test copy from with a partitioned table
 create table parted_copytest (
 	a int,

--------------2.34.1--

v11-0002-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v11-0002-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 7c72199c4eb8e5e9d6a9fea79ecff1e85db543c4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Tue, 13 Oct 2020 14:45:56 +0200
Subject: [PATCH v11 2/2] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.34.1"

This is a multi-part message in MIME format.
--------------2.34.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/expected/file_fdw.out | 10 +++-
 contrib/file_fdw/sql/file_fdw.sql      |  7 +++
 doc/src/sgml/ref/copy.sgml             | 12 +++--
 src/backend/commands/copy.c            | 65 ++++++++++++++++++++++++--
 src/backend/commands/copyfromparse.c   | 52 +++++++++++++++++++--
 src/backend/commands/copyto.c          |  2 +-
 src/include/commands/copy.h            | 12 ++++-
 src/test/regress/expected/copy.out     | 17 +++++++
 src/test/regress/sql/copy.sql          | 25 ++++++++++
 9 files changed, 188 insertions(+), 14 deletions(-)


--------------2.34.1
Content-Type: text/x-patch; name="v11-0002-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v11-0002-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index ba6fdba598..74ed8fae46 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -113,6 +113,12 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename :'filename', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+\set filename :abs_srcdir '/data/list1.csv'
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 \set filename :abs_srcdir '/data/text.csv'
 CREATE FOREIGN TABLE text_csv (
@@ -464,12 +470,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_doesnt_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index 86f876d565..9f3c7219d0 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -103,6 +103,13 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename :'filename', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+\set filename :abs_srcdir '/data/list1.csv'
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 \set filename :abs_srcdir '/data/text.csv'
 CREATE FOREIGN TABLE text_csv (
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 2d3847e0d1..f36eca02fb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -276,9 +276,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is allowed only when using <literal>CSV</literal> or
-      <literal>text</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case doesn't match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 382a12cfb2..cd427aaa0f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -314,7 +314,66 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 }
 
 /*
- * Process the statement option list for COPY.
+* Extract a CopyHeader value from a DefElem.
+*/
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	* If no parameter given, assume "true" is meant.
+	*/
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	* Allow 0, 1, "true", "false", "on", "off" or "match".
+	*/
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	*sval = defGetString(def);
+
+				/*
+				* The set of strings accepted here should match up with the
+				* grammar's opt_boolean_or_string production.
+				*/
+				if (pg_strcasecmp(sval, "true") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+						return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+
+	ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;						/* keep compiler quiet */
+}
+
+/*
+* Process the statement option list for COPY.
  *
  * Scan the options list (a list of DefElem) and transpose the information
  * into *opts_out, applying appropriate error checking.
@@ -394,7 +453,7 @@ ProcessCopyOptions(ParseState *pstate,
 			if (header_specified)
 				errorConflictingDefElem(defel, pstate);
 			header_specified = true;
-			opts_out->header_line = defGetBoolean(defel);
+			opts_out->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -555,7 +614,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV and text mode")));
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index aac10165ec..e3452c26aa 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -758,12 +759,55 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->opts.binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->opts.header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->opts.csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("incomplete header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int                             attnum = lfirst_int(cur);
+				char              *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->opts.null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("wrong header for column \"%s\": got \"%s\"",
+									NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e9b962c7aa..a2cf680c67 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -848,7 +848,7 @@ DoCopyTo(CopyToState cstate)
 															  cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
+		if (cstate->opts.header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3f91cdc47..c69eb6753f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,6 +19,16 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
+/*
+ * Represents whether the header must match, be absent or be present.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -32,7 +42,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* CSV or text header line? */
+	CopyHeader	header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 08de57ee69..8461388085 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -254,3 +254,20 @@ INFO:  progress: {"type": "FILE", "command": "COPY FROM", "relname": "tab_progre
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  incomplete header, expected 3 columns but got 2
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header, expected 3 columns but got 4
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 6809f43e26..c98b6dd91c 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -303,3 +303,28 @@ copy tab_progress_reporting from :'filename'
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;

--------------2.34.1--

#66

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Rémi Lapeyre (#65)

Re: Add header support to text format and matching feature

On 31.12.21 18:36, Rémi Lapeyre wrote:

Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the same as v10 which was already marked as ready for committer.

I have committed the 0001 patch. I will work on the 0002 patch next.

I notice in the 0002 patch that there is no test case for the error
"wrong header for column \"%s\": got \"%s\"", which I think is really
the core functionality of this patch. So please add that.

I wonder whether the header matching should be a separate option from
the HEADER option. The option parsing in this patch is quite
complicated and could be simpler if there were two separate options. It
appears this has been mentioned in the thread but not fully discussed.

#67

remi.lapeyre@lenstra.fr

almost 4 years ago

In reply to: Peter Eisentraut (#66)

1 attachment(s)

Re: Add header support to text format and matching feature

On 28 Jan 2022, at 09:57, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 31.12.21 18:36, Rémi Lapeyre wrote:

Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the same as v10 which was already marked as ready for committer.

I have committed the 0001 patch. I will work on the 0002 patch next.

Thanks!

I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"", which I think is really the core functionality of this patch. So please add that.

I added a test for it in this new version of the patch.

I wonder whether the header matching should be a separate option from the HEADER option. The option parsing in this patch is quite complicated and could be simpler if there were two separate options. It appears this has been mentioned in the thread but not fully discussed.

I suppose a new option could be added but I’m not sure it would simplify things much with regard to the code and in my opinion it would be a bit weirder for users, right now it is just:

copy my_table from stdin with (header match);

with an additional option it could be:

copy my_table from stdin with (header true, match);

with potentially “header true” being implicit when “match” is given:

copy my_table from stdin with (match);

But I think we would still have to check for and return an error if the user inputs:

copy my_table from stdin with (header off, match);

Rather than complicating things, the current implementation seemed to be the best but I will update the patch if you think I should change it.

Best regards,
Rémi

Attachments:

v12-0001-Add-header-matching-mode-to-COPY-FROM.patchapplication/octet-stream; name=v12-0001-Add-header-matching-mode-to-COPY-FROM.patch; x-unix-mode=0644Download

From 1fbfd87a4070b91a4ac3630b36a1ff1dadecaa5d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?R=C3=A9mi=20Lapeyre?= <remi.lapeyre@lenstra.fr>
Date: Tue, 13 Oct 2020 14:45:56 +0200
Subject: [PATCH v12] Add header matching mode to "COPY FROM"
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.35.0"

This is a multi-part message in MIME format.
--------------2.35.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/expected/file_fdw.out | 10 +++-
 contrib/file_fdw/sql/file_fdw.sql      |  7 +++
 doc/src/sgml/ref/copy.sgml             | 11 +++--
 src/backend/commands/copy.c            | 65 ++++++++++++++++++++++++--
 src/backend/commands/copyfromparse.c   | 52 +++++++++++++++++++--
 src/backend/commands/copyto.c          |  2 +-
 src/include/commands/copy.h            | 12 ++++-
 src/test/regress/expected/copy.out     | 20 ++++++++
 src/test/regress/sql/copy.sql          | 29 ++++++++++++
 9 files changed, 195 insertions(+), 13 deletions(-)


--------------2.35.0
Content-Type: text/x-patch; name="v12-0001-Add-header-matching-mode-to-COPY-FROM.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v12-0001-Add-header-matching-mode-to-COPY-FROM.patch"

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 0ac6e4e0d7..b1617640d8 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -113,6 +113,12 @@ CREATE FOREIGN TABLE agg_bad (
 ) SERVER file_server
 OPTIONS (format 'csv', filename :'filename', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
+-- test header matching
+\set filename :abs_srcdir '/data/list1.csv'
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');	-- ERROR
 -- per-column options tests
 \set filename :abs_srcdir '/data/text.csv'
 CREATE FOREIGN TABLE text_csv (
@@ -464,12 +470,14 @@ SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
-NOTICE:  drop cascades to 7 other objects
+NOTICE:  drop cascades to 9 other objects
 DETAIL:  drop cascades to server file_server
 drop cascades to user mapping for regress_file_fdw_superuser on server file_server
 drop cascades to user mapping for regress_no_priv_user on server file_server
 drop cascades to foreign table agg_text
 drop cascades to foreign table agg_csv
 drop cascades to foreign table agg_bad
+drop cascades to foreign table header_match
+drop cascades to foreign table header_doesnt_match
 drop cascades to foreign table text_csv
 DROP ROLE regress_file_fdw_superuser, regress_file_fdw_user, regress_no_priv_user;
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index 86f876d565..9f3c7219d0 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -103,6 +103,13 @@ CREATE FOREIGN TABLE agg_bad (
 OPTIONS (format 'csv', filename :'filename', header 'true', delimiter ';', quote '@', escape '"', null '');
 ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0);
 
+-- test header matching
+\set filename :abs_srcdir '/data/list1.csv'
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match');	-- ERROR
+
 -- per-column options tests
 \set filename :abs_srcdir '/data/text.csv'
 CREATE FOREIGN TABLE text_csv (
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1b7d001963..f36eca02fb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -36,7 +36,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
     DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
     NULL '<replaceable class="parameter">null_string</replaceable>'
-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER [ <literal>match</literal> | <literal>true</literal> | <literal>false</literal> ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -276,8 +276,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Specifies that the file contains a header line with the names of each
       column in the file.  On output, the first line contains the column
-      names from the table, and on input, the first line is ignored.
-      This option is not allowed when using <literal>binary</literal> format.
+      names from the table. On input, the first line is discarded when
+      <literal>header</literal> is set to <literal>true</literal> or required
+      to match the column names if set to <literal>match</literal>. If the
+      number of columns in the header is not correct, their order differs
+      from the one expected, or the name or case doesn't match, the copy will
+      be aborted with an error.  This option is allowed only when using
+      <literal>CSV</literal> or <literal>text</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7da7105d44..830bd9f762 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -314,7 +314,66 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 }
 
 /*
- * Process the statement option list for COPY.
+* Extract a CopyHeader value from a DefElem.
+*/
+static CopyHeader
+DefGetCopyHeader(DefElem *def)
+{
+	/*
+	* If no parameter given, assume "true" is meant.
+	*/
+	if (def->arg == NULL)
+		return COPY_HEADER_PRESENT;
+
+	/*
+	* Allow 0, 1, "true", "false", "on", "off" or "match".
+	*/
+	switch (nodeTag(def->arg))
+	{
+		case T_Integer:
+			switch (intVal(def->arg))
+			{
+				case 0:
+					return COPY_HEADER_ABSENT;
+				case 1:
+					return COPY_HEADER_PRESENT;
+				default:
+					/* otherwise, error out below */
+					break;
+			}
+			break;
+		default:
+			{
+				char	*sval = defGetString(def);
+
+				/*
+				* The set of strings accepted here should match up with the
+				* grammar's opt_boolean_or_string production.
+				*/
+				if (pg_strcasecmp(sval, "true") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "false") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "on") == 0)
+						return COPY_HEADER_PRESENT;
+				if (pg_strcasecmp(sval, "off") == 0)
+						return COPY_HEADER_ABSENT;
+				if (pg_strcasecmp(sval, "match") == 0)
+						return COPY_HEADER_MATCH;
+
+			}
+			break;
+	}
+
+	ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a boolean or \"match\"",
+					def->defname)));
+	return COPY_HEADER_ABSENT;						/* keep compiler quiet */
+}
+
+/*
+* Process the statement option list for COPY.
  *
  * Scan the options list (a list of DefElem) and transpose the information
  * into *opts_out, applying appropriate error checking.
@@ -394,7 +453,7 @@ ProcessCopyOptions(ParseState *pstate,
 			if (header_specified)
 				errorConflictingDefElem(defel, pstate);
 			header_specified = true;
-			opts_out->header_line = defGetBoolean(defel);
+			opts_out->header_line = DefGetCopyHeader(defel);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
@@ -555,7 +614,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot specify HEADER in BINARY mode")));
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index baf328b620..5ab59c0631 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
@@ -758,12 +759,55 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* only available for text or csv input */
 	Assert(!cstate->opts.binary);
 
-	/* on input just throw the header line away */
-	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_ABSENT)
 	{
+		ListCell   *cur;
+		TupleDesc   tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+
 		cstate->cur_lineno++;
-		if (CopyReadLine(cstate))
-			return false;		/* done */
+		done = CopyReadLine(cstate);
+
+		if (cstate->opts.header_line == COPY_HEADER_MATCH)
+		{
+			if (cstate->opts.csv_mode)
+				fldct = CopyReadAttributesCSV(cstate);
+			else
+				fldct = CopyReadAttributesText(cstate);
+
+			if (fldct < list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("incomplete header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+			else if (fldct > list_length(cstate->attnumlist))
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected header, expected %d columns but got %d",
+								list_length(cstate->attnumlist), fldct)));
+
+			foreach(cur, cstate->attnumlist)
+			{
+				int                             attnum = lfirst_int(cur);
+				char              *colName = cstate->raw_fields[attnum - 1];
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+				if (colName == NULL)
+					colName = cstate->opts.null_print;
+
+				if (namestrcmp(&attr->attname, colName) != 0) {
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("wrong header for column \"%s\": got \"%s\"",
+									NameStr(attr->attname), colName)));
+				}
+			}
+		}
+
+		if (done)
+			return false;
 	}
 
 	cstate->cur_lineno++;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 3283ef50d0..6c611c0277 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -846,7 +846,7 @@ DoCopyTo(CopyToState cstate)
 															  cstate->file_encoding);
 
 		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
+		if (cstate->opts.header_line != COPY_HEADER_ABSENT)
 		{
 			bool		hdr_delim = false;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8694da5004..6fb2cade6b 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,6 +19,16 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
+/*
+ * Represents whether the header must match, be absent or be present.
+ */
+typedef enum CopyHeader
+{
+	COPY_HEADER_ABSENT,
+	COPY_HEADER_PRESENT,
+	COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -32,7 +42,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
-	bool		header_line;	/* header line? */
+	CopyHeader	header_line;	/* CSV or text header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
 	char	   *null_print_client;	/* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 851b9a4a2d..77b9a70b74 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -254,3 +254,23 @@ INFO:  progress: {"type": "FILE", "command": "COPY FROM", "relname": "tab_progre
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+ERROR:  header requires a boolean or "match"
+copy header_copytest from stdin with (header match);
+copy header_copytest from stdin with (header match);
+ERROR:  incomplete header, expected 3 columns but got 2
+CONTEXT:  COPY header_copytest, line 1: "a	b"
+copy header_copytest from stdin with (header match);
+ERROR:  extra data after last expected header, expected 3 columns but got 4
+CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
+copy header_copytest from stdin with (header match);
+ERROR:  wrong header for column "c": got "d"
+CONTEXT:  COPY header_copytest, line 1: "a	b	d"
+copy header_copytest from stdin with (header match, format csv);
+drop table header_copytest;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 016fedf675..5e192428a3 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -303,3 +303,32 @@ copy tab_progress_reporting from :'filename'
 drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
 drop function notice_after_tab_progress_reporting();
 drop table tab_progress_reporting;
+
+-- Test header matching feature
+create table header_copytest (
+	a int,
+	b int,
+	c text
+);
+copy header_copytest from stdin with (header wrong_choice);
+copy header_copytest from stdin with (header match);
+a	b	c
+1	2	foo
+\.
+copy header_copytest from stdin with (header match);
+a	b
+1	2
+\.
+copy header_copytest from stdin with (header match);
+a	b	c	d
+1	2	foo	bar
+\.
+copy header_copytest from stdin with (header match);
+a	b	d
+1	2	foo
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+1,2,foo
+\.
+drop table header_copytest;

--------------2.35.0--

#68

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Rémi Lapeyre (#67)

Re: Add header support to text format and matching feature

On 30.01.22 23:56, Rémi Lapeyre wrote:

I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"", which I think is really the core functionality of this patch. So please add that.

I added a test for it in this new version of the patch.

The file_fdw.sql tests contain this

+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER 
file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 
'match');   -- ERROR

but no actual error is generated. Please review the additions on the
file_fdw tests to see that they make sense.

#69

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#68)

Re: Add header support to text format and matching feature

On 31.01.22 07:54, Peter Eisentraut wrote:

On 30.01.22 23:56, Rémi Lapeyre wrote:

I notice in the 0002 patch that there is no test case for the error
"wrong header for column \"%s\": got \"%s\"", which I think is really
the core functionality of this patch. So please add that.

I added a test for it in this new version of the patch.

The file_fdw.sql tests contain this
+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER 
file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 
'match');   -- ERROR
but no actual error is generated. Please review the additions on the
file_fdw tests to see that they make sense.

A few more comments on your latest patch:

- The DefGetCopyHeader() function seems very bulky and might not be
necessary. I think you can just check for the string "match" first and
then use defGetBoolean() as before if it didn't match.

- If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the
existing truth checks don't need to be changed.

- In NextCopyFromRawFields(), it looks like you have code that replaces
the null_print value if the supplied column name is null. I don't think
we should allow null column values. Someone, this should be an error.
(Do we use null_print on output and make the column name null if it
matches?)

#70

Greg Stark

stark@mit.edu

almost 4 years ago

In reply to: Peter Eisentraut (#69)

Re: Add header support to text format and matching feature

Great to see the first of the two patches committed.

It looks like the second patch got some feedback from Peter in
February and has been marked "Waiting on author" ever since.

Remi, will you have a chance to look at this this month?

Peter, are these comments blocking if Remi doesn't have a chance to
work on it should I move it to the next release or could it be fixed
up and committed?

#71

[1]: /messages/by-id/80a9b594-01d6-4ee4-a612-9ae9fd3c1194@manitou-mail.org
/messages/by-id/80a9b594-01d6-4ee4-a612-9ae9fd3c1194@manitou-mail.org

daniel@manitou-mail.org

almost 4 years ago

In reply to: Peter Eisentraut (#69)

Re: Add header support to text format and matching feature

Peter Eisentraut wrote:

- The DefGetCopyHeader() function seems very bulky and might not be
necessary. I think you can just check for the string "match" first and
then use defGetBoolean() as before if it didn't match.

The problem is that defGetBoolean() ends like this in the non-matching
case:

ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("%s requires a Boolean value",
def->defname)));

We don't want this error message when the string does not match
since it's really a tri-state option, not a boolean.

To avoid duplicating the code that recognizes true/false/on/off/0/1,
probably defGetBoolean()'s guts should be moved into another function
that does the matching and report to the caller instead of throwing an
error. Then DefGetCopyHeader() could call that non-throwing function.

- If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the
existing truth checks don't need to be changed.

It's already 0 by default. Not changing some truth checks does work, but
then we get some code that treat CopyFromState.header_line like
a boolean and some other code like an enum, which I found not
ideal wrt clarity in an earlier round of review [1]/messages/by-id/80a9b594-01d6-4ee4-a612-9ae9fd3c1194@manitou-mail.org

It's not a major issue though, as it  concerns only 3 lines of code in the
v12
patch:
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)

+	/* on input check that the header line is correct if needed */
+	if (cstate->cur_lineno == 0 && cstate->opts.header_line !=
COPY_HEADER_ABSENT)

-		if (cstate->opts.header_line)
+		if (cstate->opts.header_line != COPY_HEADER_ABSENT)

- In NextCopyFromRawFields(), it looks like you have code that replaces
the null_print value if the supplied column name is null. I don't think
we should allow null column values. Someone, this should be an error.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#72

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Greg Stark (#70)

Re: Add header support to text format and matching feature

On 25.03.22 05:21, Greg Stark wrote:

Great to see the first of the two patches committed.

It looks like the second patch got some feedback from Peter in
February and has been marked "Waiting on author" ever since.

Remi, will you have a chance to look at this this month?

Peter, are these comments blocking if Remi doesn't have a chance to
work on it should I move it to the next release or could it be fixed
up and committed?

I will try to finish up this patch.

#73

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#72)

Re: Add header support to text format and matching feature

On 29.03.22 17:02, Peter Eisentraut wrote:

On 25.03.22 05:21, Greg Stark wrote:

Great to see the first of the two patches committed.

It looks like the second patch got some feedback from Peter in
February and has been marked "Waiting on author" ever since.

Remi, will you have a chance to look at this this month?

Peter, are these comments blocking if Remi doesn't have a chance to
work on it should I move it to the next release or could it be fixed
up and committed?

I will try to finish up this patch.

Committed, after some further refinements as discussed.

#74

rjuju123@gmail.com

over 3 years ago

In reply to: Peter Eisentraut (#73)

Re: Add header support to text format and matching feature

Hi,

On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote:

Committed, after some further refinements as discussed.

While working on nearby code, I found some problems with this feature.

First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
expected? The documentation isn't really explicit about it, but there's
nothing to match when exporting data it's a bit surprising. I'm not opposed to
have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
the commands history, but maybe it should be clearly documented?

Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
column list. This one looks like a clear oversight, as something like that
should be entirely valid IMHO:

CREATE TABLE tbl(col1 int, col2 int);
COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);

but right now it errors out with:

ERROR: column name mismatch in header line field 1: got "col1", expected "col2"

Note that the error message is bogus if you specify attributes in a
different order from the relation, as the code is mixing access to the tuple
desc and access to the raw fields with the same offset.

This also means that it will actually fail to detect a mismatch in the provided
column list and let you import data in the wrong position as long as the
datatypes are compatible and the column header in the file are in the correct
order. For instance:

CREATE TABLE abc (a text, b text, c text);
INSERT INTO abc SELECT 'a', 'b', 'c';
COPY abc TO '/path/to/file' WITH (HEADER MATCH);

You can then import the data with any of those:
COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH);
COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH);
[...]
SELECT * FROM abc;

Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table
that has some dropped attribute(s). The current code will access random memory
as there's no exact attnum / raw field mapping anymore.

I can work on a fix if needed (with some additional regression test to cover
those cases), but I'm still not sure that having a user provided column list is
supposed to be accepted or not for the HEADER MATCH. In the meantime I will
add an open item.

#75

Andrew Dunstan

andrew@dunslane.net

over 3 years ago

In reply to: Julien Rouhaud (#74)

Re: Add header support to text format and matching feature

On 2022-06-07 Tu 11:47, Julien Rouhaud wrote:

Hi,

On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote:

Committed, after some further refinements as discussed.

While working on nearby code, I found some problems with this feature.

First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
expected? The documentation isn't really explicit about it, but there's
nothing to match when exporting data it's a bit surprising. I'm not opposed to
have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
the commands history, but maybe it should be clearly documented?

I think it makes more sense to have a sanity check to prevent HEADER
MATCH with COPY TO.

Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
column list. This one looks like a clear oversight, as something like that
should be entirely valid IMHO:

CREATE TABLE tbl(col1 int, col2 int);
COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);

but right now it errors out with:

ERROR: column name mismatch in header line field 1: got "col1", expected "col2"

Note that the error message is bogus if you specify attributes in a
different order from the relation, as the code is mixing access to the tuple
desc and access to the raw fields with the same offset.

This also means that it will actually fail to detect a mismatch in the provided
column list and let you import data in the wrong position as long as the
datatypes are compatible and the column header in the file are in the correct
order. For instance:

CREATE TABLE abc (a text, b text, c text);
INSERT INTO abc SELECT 'a', 'b', 'c';
COPY abc TO '/path/to/file' WITH (HEADER MATCH);

You can then import the data with any of those:
COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH);
COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH);
[...]
SELECT * FROM abc;

Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table
that has some dropped attribute(s). The current code will access random memory
as there's no exact attnum / raw field mapping anymore.

Ouch! That certainly needs to be fixed.

I can work on a fix if needed (with some additional regression test to cover
those cases), but I'm still not sure that having a user provided column list is
supposed to be accepted or not for the HEADER MATCH. In the meantime I will
add an open item.

I think it should, but a temporary alternative would be to forbid HEADER
MATCH with explicit column lists until we can make it work right.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#76

rjuju123@gmail.com

over 3 years ago

In reply to: Andrew Dunstan (#75)

1 attachment(s)

Re: Add header support to text format and matching feature

Hi,

On Sun, Jun 12, 2022 at 09:36:13AM -0400, Andrew Dunstan wrote:

On 2022-06-07 Tu 11:47, Julien Rouhaud wrote:

First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
expected? The documentation isn't really explicit about it, but there's
nothing to match when exporting data it's a bit surprising. I'm not opposed to
have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
the commands history, but maybe it should be clearly documented?

I think it makes more sense to have a sanity check to prevent HEADER
MATCH with COPY TO.

I'm fine with it. I added such a check and mentioned it in the documentation.

Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
column list. This one looks like a clear oversight, as something like that
should be entirely valid IMHO:

CREATE TABLE tbl(col1 int, col2 int);
COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);

but right now it errors out with:

ERROR: column name mismatch in header line field 1: got "col1", expected "col2"

Note that the error message is bogus if you specify attributes in a
different order from the relation, as the code is mixing access to the tuple
desc and access to the raw fields with the same offset.
[...]

I think it should, but a temporary alternative would be to forbid HEADER
MATCH with explicit column lists until we can make it work right.

I think it would still be problematic if the target table has dropped columns.
Fortunately, as I initially thought the problem is only due to a thinko in the
original commit which used a wrong variable for the raw_fields offset. Once
fixed (attached v1) I didn't see any other problem in the rest of the logic and
all the added regression tests work as expected.

Attachments:

v1-0001-Fix-processing-of-header-match-option-for-COPY.patchtext/plain; charset=us-asciiDownload

From 2f49e9feb2827856c8a0e3098500109ddd1962c9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 13 Jun 2022 09:49:23 +0800
Subject: [PATCH v1] Fix processing of header match option for COPY.

Thinko in 072132f04en which used the attnum offset to access the raw_fields
array, leading to incorrect results of crash.  Use the correct variable, and
add some regression tests to cover a bit more scenario for the HEADER MATCH
option.

While at it, disallow HEADER MATCH in COPY TO as there is no validation that
can be done in that case.

Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220607154744.vvmitnqhyxrne5ms@jrouhaud
---
 doc/src/sgml/ref/copy.sgml           |  2 ++
 src/backend/commands/copy.c          | 12 ++++++++++--
 src/backend/commands/copyfromparse.c |  5 +++--
 src/test/regress/expected/copy.out   | 21 ++++++++++++++++++++-
 src/test/regress/sql/copy.sql        | 24 ++++++++++++++++++++----
 5 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 40af423ccf..8aae711b3b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -282,6 +282,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       of the columns in the header line must match the actual column names of
       the table, otherwise an error is raised.
       This option is not allowed when using <literal>binary</literal> format.
+      The <literal>MATCH</literal> option is only valid for <command>COPY
+      FROM</command> commands.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f448d39c7e..e596bebb0b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -318,7 +318,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
  * defGetBoolean() but also accepts the special value "match".
  */
 static CopyHeaderChoice
-defGetCopyHeaderChoice(DefElem *def)
+defGetCopyHeaderChoice(DefElem *def, bool is_from)
 {
 	/*
 	 * If no parameter given, assume "true" is meant.
@@ -360,7 +360,15 @@ defGetCopyHeaderChoice(DefElem *def)
 				if (pg_strcasecmp(sval, "off") == 0)
 					return COPY_HEADER_FALSE;
 				if (pg_strcasecmp(sval, "match") == 0)
+				{
+					/* match is only valid for COPY FROM */
+					if (!is_from)
+						ereport(ERROR,
+							(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s match is only valid for COPY FROM",
+								def->defname)));
 					return COPY_HEADER_MATCH;
+				}
 			}
 			break;
 	}
@@ -452,7 +460,7 @@ ProcessCopyOptions(ParseState *pstate,
 			if (header_specified)
 				errorConflictingDefElem(defel, pstate);
 			header_specified = true;
-			opts_out->header_line = defGetCopyHeaderChoice(defel);
+			opts_out->header_line = defGetCopyHeaderChoice(defel, is_from);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e06534943f..57813b3458 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -789,11 +789,12 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 			foreach(cur, cstate->attnumlist)
 			{
 				int			attnum = lfirst_int(cur);
-				char	   *colName = cstate->raw_fields[attnum - 1];
+				char	   *colName;
 				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-				fldnum++;
+				Assert(fldnum < cstate->max_fields);
 
+				colName = cstate->raw_fields[fldnum++];
 				if (colName == NULL)
 					ereport(ERROR,
 							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index e8d6b4fc13..e9d1ec348a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -182,9 +182,21 @@ create table header_copytest (
 	b int,
 	c text
 );
+-- Make sure it works with with dropped columns
+alter table header_copytest drop column c;
+alter table header_copytest add column c text;
+copy header_copytest to stdout with (header match);
+ERROR:  header match is only valid for COPY FROM
 copy header_copytest from stdin with (header wrong_choice);
 ERROR:  header requires a Boolean value or "match"
+-- works
 copy header_copytest from stdin with (header match);
+copy header_copytest (c, a, b) from stdin with (header match);
+copy header_copytest from stdin with (header match, format csv);
+-- the rest errors out
+copy header_copytest (c, b, a) from stdin with (header match);
+ERROR:  column name mismatch in header line field 1: got "a", expected "c"
+CONTEXT:  COPY header_copytest, line 1: "a	b	c"
 copy header_copytest from stdin with (header match);
 ERROR:  column name mismatch in header line field 3: got null value ("\N"), expected "c"
 CONTEXT:  COPY header_copytest, line 1: "a	b	\N"
@@ -197,5 +209,12 @@ CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
 copy header_copytest from stdin with (header match);
 ERROR:  column name mismatch in header line field 3: got "d", expected "c"
 CONTEXT:  COPY header_copytest, line 1: "a	b	d"
-copy header_copytest from stdin with (header match, format csv);
+SELECT * FROM header_copytest ORDER BY a;
+ a | b |  c  
+---+---+-----
+ 1 | 2 | foo
+ 3 | 4 | bar
+ 5 | 6 | baz
+(3 rows)
+
 drop table header_copytest;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index d72d226f34..0b15ed2653 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -204,11 +204,29 @@ create table header_copytest (
 	b int,
 	c text
 );
+-- Make sure it works with with dropped columns
+alter table header_copytest drop column c;
+alter table header_copytest add column c text;
+copy header_copytest to stdout with (header match);
 copy header_copytest from stdin with (header wrong_choice);
+-- works
 copy header_copytest from stdin with (header match);
 a	b	c
 1	2	foo
 \.
+copy header_copytest (c, a, b) from stdin with (header match);
+c	a	b
+bar	3	4
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+5,6,baz
+\.
+-- the rest errors out
+copy header_copytest (c, b, a) from stdin with (header match);
+a	b	c
+1	2	foo
+\.
 copy header_copytest from stdin with (header match);
 a	b	\N
 1	2	foo
@@ -225,8 +243,6 @@ copy header_copytest from stdin with (header match);
 a	b	d
 1	2	foo
 \.
-copy header_copytest from stdin with (header match, format csv);
-a,b,c
-1,2,foo
-\.
+
+SELECT * FROM header_copytest ORDER BY a;
 drop table header_copytest;
-- 
2.33.1

#77

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Julien Rouhaud (#76)

Re: Add header support to text format and matching feature

On 13.06.22 04:32, Julien Rouhaud wrote:

I think it makes more sense to have a sanity check to prevent HEADER
MATCH with COPY TO.

I'm fine with it. I added such a check and mentioned it in the documentation.

I think it would still be problematic if the target table has dropped columns.
Fortunately, as I initially thought the problem is only due to a thinko in the
original commit which used a wrong variable for the raw_fields offset. Once
fixed (attached v1) I didn't see any other problem in the rest of the logic and
all the added regression tests work as expected.

Thanks for this patch. I'll check it in detail in a bit. It looks good
to me at first glance.

#78

michael@paquier.xyz

over 3 years ago

In reply to: Julien Rouhaud (#76)

1 attachment(s)

Re: Add header support to text format and matching feature

On Mon, Jun 13, 2022 at 10:32:13AM +0800, Julien Rouhaud wrote:

On Sun, Jun 12, 2022 at 09:36:13AM -0400, Andrew Dunstan wrote:
I'm fine with it. I added such a check and mentioned it in the documentation.

An error looks like the right call at this stage of the game. I am
not sure what the combination of MATCH with COPY TO would mean,
actually. And with the concept of SELECT queries on top of it, the
whole idea gets blurrier.

I think it would still be problematic if the target table has dropped columns.
Fortunately, as I initially thought the problem is only due to a thinko in the
original commit which used a wrong variable for the raw_fields offset. Once
fixed (attached v1) I didn't see any other problem in the rest of the logic and
all the added regression tests work as expected.

Interesting catch. One thing that I've always found useful when it
comes to tests that stress dropped columns is to have tests where we
reduce the number of total columns that still exist. An extra thing
is to look after ........pg.dropped.N........ a bit, and I would put
something in one of the headers.

if (pg_strcasecmp(sval, "match") == 0)
+	{
+		/* match is only valid for COPY FROM */
+		if (!is_from)
+			ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+			 errmsg("%s match is only valid for COPY FROM",
+					def->defname)));

Some nits. I would suggest to reword this error message, like "cannot
use \"match\" with HEADER in COPY TO". There is no need for the extra
comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED.
--
Michael

Attachments:

v2-0001-Fix-processing-of-header-match-option-for-COPY.patchtext/x-diff; charset=us-asciiDownload

From af4e06ca5f0a13ad5922abae446bd6716bbdf3c1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 13 Jun 2022 09:49:23 +0800
Subject: [PATCH v2] Fix processing of header match option for COPY.

Thinko in 072132f04en which used the attnum offset to access the raw_fields
array, leading to incorrect results of crash.  Use the correct variable, and
add some regression tests to cover a bit more scenario for the HEADER MATCH
option.

While at it, disallow HEADER MATCH in COPY TO as there is no validation that
can be done in that case.

Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220607154744.vvmitnqhyxrne5ms@jrouhaud
---
 src/backend/commands/copy.c          | 11 +++++--
 src/backend/commands/copyfromparse.c |  5 ++--
 src/test/regress/expected/copy.out   | 43 ++++++++++++++++++++++++++-
 src/test/regress/sql/copy.sql        | 44 ++++++++++++++++++++++++++--
 doc/src/sgml/ref/copy.sgml           |  2 ++
 5 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f448d39c7e..e2870e3c11 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -318,7 +318,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
  * defGetBoolean() but also accepts the special value "match".
  */
 static CopyHeaderChoice
-defGetCopyHeaderChoice(DefElem *def)
+defGetCopyHeaderChoice(DefElem *def, bool is_from)
 {
 	/*
 	 * If no parameter given, assume "true" is meant.
@@ -360,7 +360,14 @@ defGetCopyHeaderChoice(DefElem *def)
 				if (pg_strcasecmp(sval, "off") == 0)
 					return COPY_HEADER_FALSE;
 				if (pg_strcasecmp(sval, "match") == 0)
+				{
+					if (!is_from)
+						ereport(ERROR,
+								(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								 errmsg("cannot use \"%s\" with HEADER in COPY TO",
+										sval)));
 					return COPY_HEADER_MATCH;
+				}
 			}
 			break;
 	}
@@ -452,7 +459,7 @@ ProcessCopyOptions(ParseState *pstate,
 			if (header_specified)
 				errorConflictingDefElem(defel, pstate);
 			header_specified = true;
-			opts_out->header_line = defGetCopyHeaderChoice(defel);
+			opts_out->header_line = defGetCopyHeaderChoice(defel, is_from);
 		}
 		else if (strcmp(defel->defname, "quote") == 0)
 		{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e06534943f..57813b3458 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -789,11 +789,12 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 			foreach(cur, cstate->attnumlist)
 			{
 				int			attnum = lfirst_int(cur);
-				char	   *colName = cstate->raw_fields[attnum - 1];
+				char	   *colName;
 				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-				fldnum++;
+				Assert(fldnum < cstate->max_fields);
 
+				colName = cstate->raw_fields[fldnum++];
 				if (colName == NULL)
 					ereport(ERROR,
 							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index e8d6b4fc13..7f2f4ae4ae 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -182,9 +182,21 @@ create table header_copytest (
 	b int,
 	c text
 );
+-- Make sure it works with with dropped columns
+alter table header_copytest drop column c;
+alter table header_copytest add column c text;
+copy header_copytest to stdout with (header match);
+ERROR:  cannot use "match" with HEADER in COPY TO
 copy header_copytest from stdin with (header wrong_choice);
 ERROR:  header requires a Boolean value or "match"
+-- works
 copy header_copytest from stdin with (header match);
+copy header_copytest (c, a, b) from stdin with (header match);
+copy header_copytest from stdin with (header match, format csv);
+-- errors
+copy header_copytest (c, b, a) from stdin with (header match);
+ERROR:  column name mismatch in header line field 1: got "a", expected "c"
+CONTEXT:  COPY header_copytest, line 1: "a	b	c"
 copy header_copytest from stdin with (header match);
 ERROR:  column name mismatch in header line field 3: got null value ("\N"), expected "c"
 CONTEXT:  COPY header_copytest, line 1: "a	b	\N"
@@ -197,5 +209,34 @@ CONTEXT:  COPY header_copytest, line 1: "a	b	c	d"
 copy header_copytest from stdin with (header match);
 ERROR:  column name mismatch in header line field 3: got "d", expected "c"
 CONTEXT:  COPY header_copytest, line 1: "a	b	d"
-copy header_copytest from stdin with (header match, format csv);
+SELECT * FROM header_copytest ORDER BY a;
+ a | b |  c  
+---+---+-----
+ 1 | 2 | foo
+ 3 | 4 | bar
+ 5 | 6 | baz
+(3 rows)
+
+-- Drop an extra column, in the middle of the existing set.
+alter table header_copytest drop column b;
+-- works
+copy header_copytest (c, a) from stdin with (header match);
+copy header_copytest (a, c) from stdin with (header match);
+-- errors
+copy header_copytest from stdin with (header match);
+ERROR:  wrong number of fields in header line: field count is 3, expected 2
+CONTEXT:  COPY header_copytest, line 1: "a	........pg.dropped.2........	c"
+copy header_copytest (a, c) from stdin with (header match);
+ERROR:  wrong number of fields in header line: field count is 3, expected 2
+CONTEXT:  COPY header_copytest, line 1: "a	c	b"
+SELECT * FROM header_copytest ORDER BY a;
+ a |  c  
+---+-----
+ 1 | foo
+ 3 | bar
+ 5 | baz
+ 7 | foo
+ 8 | foo
+(5 rows)
+
 drop table header_copytest;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index d72d226f34..285022e07c 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -204,11 +204,29 @@ create table header_copytest (
 	b int,
 	c text
 );
+-- Make sure it works with with dropped columns
+alter table header_copytest drop column c;
+alter table header_copytest add column c text;
+copy header_copytest to stdout with (header match);
 copy header_copytest from stdin with (header wrong_choice);
+-- works
 copy header_copytest from stdin with (header match);
 a	b	c
 1	2	foo
 \.
+copy header_copytest (c, a, b) from stdin with (header match);
+c	a	b
+bar	3	4
+\.
+copy header_copytest from stdin with (header match, format csv);
+a,b,c
+5,6,baz
+\.
+-- errors
+copy header_copytest (c, b, a) from stdin with (header match);
+a	b	c
+1	2	foo
+\.
 copy header_copytest from stdin with (header match);
 a	b	\N
 1	2	foo
@@ -225,8 +243,28 @@ copy header_copytest from stdin with (header match);
 a	b	d
 1	2	foo
 \.
-copy header_copytest from stdin with (header match, format csv);
-a,b,c
-1,2,foo
+SELECT * FROM header_copytest ORDER BY a;
+
+-- Drop an extra column, in the middle of the existing set.
+alter table header_copytest drop column b;
+-- works
+copy header_copytest (c, a) from stdin with (header match);
+c	a
+foo	7
 \.
+copy header_copytest (a, c) from stdin with (header match);
+a	c
+8	foo
+\.
+-- errors
+copy header_copytest from stdin with (header match);
+a	........pg.dropped.2........	c
+1	2	foo
+\.
+copy header_copytest (a, c) from stdin with (header match);
+a	c	b
+1	foo	2
+\.
+
+SELECT * FROM header_copytest ORDER BY a;
 drop table header_copytest;
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 40af423ccf..8aae711b3b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -282,6 +282,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       of the columns in the header line must match the actual column names of
       the table, otherwise an error is raised.
       This option is not allowed when using <literal>binary</literal> format.
+      The <literal>MATCH</literal> option is only valid for <command>COPY
+      FROM</command> commands.
      </para>
     </listitem>
    </varlistentry>
-- 
2.36.1

#79

rjuju123@gmail.com

over 3 years ago

In reply to: Michael Paquier (#78)

Re: Add header support to text format and matching feature

On Mon, Jun 13, 2022 at 04:46:46PM +0900, Michael Paquier wrote:

Some nits. I would suggest to reword this error message, like "cannot
use \"match\" with HEADER in COPY TO".

Agreed.

There is no need for the extra
comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED.

Is there any rule for what error code should be used?

Maybe that's just me but I understand "not supported" as "this makes sense, but
this is currently a limitation that might be lifted later".

Here I don't think it can ever make to use MATCH for a COPY TO, apart from
ignoring its meaning and accept it as an alias for HEADER ON. But if we don't
allow this loose alias now it would just cause trouble to later allow it so
having an invalid syntax or something like that sounds more suited.

#80

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Julien Rouhaud (#79)

Re: Add header support to text format and matching feature

On 14.06.22 11:13, Julien Rouhaud wrote:

There is no need for the extra
comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED.

Is there any rule for what error code should be used?

Maybe that's just me but I understand "not supported" as "this makes sense, but
this is currently a limitation that might be lifted later".

I tend to agree with that interpretation.

Also, when you consider the way SQL rules and error codes are set up,
errors that are detected during parse analysis should be a subclass of
"syntax error or access rule violation".

#81

daniel@manitou-mail.org

over 3 years ago

In reply to: Julien Rouhaud (#79)

Re: Add header support to text format and matching feature

Julien Rouhaud wrote:

Maybe that's just me but I understand "not supported" as "this makes
sense, but this is currently a limitation that might be lifted
later".

Looking at ProcessCopyOptions(), there are quite a few invalid
combinations of options that produce
ERRCODE_FEATURE_NOT_SUPPORTED currently:

- HEADER in binary mode
- FORCE_QUOTE outside of csv
- FORCE_QUOTE outside of COPY TO
- FORCE_NOT_NULL outside of csv
- FORCE_NOT_NULL outside of COPY FROM
- ESCAPE outside of csv
- delimiter appearing in the NULL specification
- csv quote appearing in the NULL specification

FORCE_QUOTE and FORCE_NOT_NULL are options that only make sense in one
direction, so the errors when using these in the wrong direction are
comparable to the "HEADER MATCH outside of COPY FROM" error that we
want to add. In that sense, ERRCODE_FEATURE_NOT_SUPPORTED would be
consistent.

The other errors in the list above are more about the format itself,
with options that make sense only for one format. So the way we're
supposed to understand ERRCODE_FEATURE_NOT_SUPPORTED in these
other cases is that such format does not support such feature,
but without implying that it's a limitation.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#82

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Daniel Verite (#81)

Re: Add header support to text format and matching feature

On 15.06.22 13:50, Daniel Verite wrote:

The other errors in the list above are more about the format itself,
with options that make sense only for one format. So the way we're
supposed to understand ERRCODE_FEATURE_NOT_SUPPORTED in these
other cases is that such format does not support such feature,
but without implying that it's a limitation.

I don't feel very strongly about this. It makes sense to stay
consistent with the existing COPY code.

#83

michael@paquier.xyz

over 3 years ago

In reply to: Peter Eisentraut (#82)

Re: Add header support to text format and matching feature

On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:

I don't feel very strongly about this. It makes sense to stay consistent
with the existing COPY code.

Yes, my previous argument is based on consistency with the
surroundings. I am not saying that this could not be made better, it
surely can, but I would recommend to tackle that in a separate patch,
and apply that to more areas than this specific one.
--
Michael

#84

michael@paquier.xyz

over 3 years ago

In reply to: Michael Paquier (#83)

Re: Add header support to text format and matching feature

On Mon, Jun 20, 2022 at 09:03:23AM +0900, Michael Paquier wrote:

On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:

I don't feel very strongly about this. It makes sense to stay consistent
with the existing COPY code.

Yes, my previous argument is based on consistency with the
surroundings. I am not saying that this could not be made better, it
surely can, but I would recommend to tackle that in a separate patch,
and apply that to more areas than this specific one.

Peter, beta2 is planned for next week. Do you think that you would be
able to address this open item by the end of this week? If not, and I
have already looked at the proposed patch, I can jump in and help.
--
Michael

#85

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Michael Paquier (#84)

Re: Add header support to text format and matching feature

On 22.06.22 01:34, Michael Paquier wrote:

On Mon, Jun 20, 2022 at 09:03:23AM +0900, Michael Paquier wrote:

On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:

I don't feel very strongly about this. It makes sense to stay consistent
with the existing COPY code.

Yes, my previous argument is based on consistency with the
surroundings. I am not saying that this could not be made better, it
surely can, but I would recommend to tackle that in a separate patch,
and apply that to more areas than this specific one.

Peter, beta2 is planned for next week. Do you think that you would be
able to address this open item by the end of this week? If not, and I
have already looked at the proposed patch, I can jump in and help.

The latest patch was posted by you, so I was deferring to you to commit
it. Would you like me to do it?

#86

michael@paquier.xyz

over 3 years ago

In reply to: Peter Eisentraut (#85)

Re: Add header support to text format and matching feature

On Wed, Jun 22, 2022 at 12:22:01PM +0200, Peter Eisentraut wrote:

The latest patch was posted by you, so I was deferring to you to commit it.
Would you like me to do it?

OK. As this is originally a feature you have committed, I originally
thought that you would take care of it, even if I sent a patch. I'll
handle that tomorrow then, if that's fine for you, of course. Happy
to help.
--
Michael

#87

michael@paquier.xyz

over 3 years ago

In reply to: Michael Paquier (#86)

Re: Add header support to text format and matching feature

On Wed, Jun 22, 2022 at 08:00:15PM +0900, Michael Paquier wrote:

OK. As this is originally a feature you have committed, I originally
thought that you would take care of it, even if I sent a patch. I'll
handle that tomorrow then, if that's fine for you, of course. Happy
to help.

And done. Thanks.
--
Michael

#88