Extend COPY FROM with HEADER <integer> to skip multiple lines
Hi hackers,
I'd like to propose a new feature for the COPY FROM command to allow
skipping multiple header lines when loading data. This enhancement
would enable files with multi-line headers to be loaded without any
preprocessing, which would significantly improve usability.
In real-world scenarios, it's common for data files to contain
multiple header lines, such as file descriptions or column
explanations. Currently, the COPY command cannot load these files
directly, which requires users to preprocess them with tools like sed
or tail.
Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'",
some environments do not have the tail command available.
Additionally, this approach requires superuser privileges or
membership in the pg_execute_server_program role.
This feature also has precedent in other major RDBMS:
- MySQL: LOAD DATA ... IGNORE N LINES [1]https://dev.mysql.com/doc/refman/8.4/en/load-data.html#load-data-field-line-handling
- SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2]https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver17#firstrow--first_row
- Oracle SQL*Loader: sqlldr … SKIP=N [3]https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-sql-loader-commands.html#SUTIL-GUID-84244C46-6AFD-412D-9240-BEB0B5C2718B
I have not yet created a patch, but I am willing to implement an
extension for the HEADER option. I would like to discuss the
specification first.
The specification I have in mind is as follows:
- Command: COPY FROM
- Formats: text and csv
- Option syntax: HEADER [ boolean | integer | MATCH] (Extend the
HEADER option to accept an integer value in addition to the existing
boolean and MATCH keywords.)
- Behavior: Let N be the specified integer.
- If N < 0, raise an error.
- If N = 0 or 1, same behavior when boolean is specified.
- If N > 1, skip the first N rows.
Thoughts?
[1]: https://dev.mysql.com/doc/refman/8.4/en/load-data.html#load-data-field-line-handling
[2]: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver17#firstrow--first_row
[3]: https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-sql-loader-commands.html#SUTIL-GUID-84244C46-6AFD-412D-9240-BEB0B5C2718B
--
Best regards,
Shinya Kato
NTT OSS Center
On 2025/06/09 16:10, Shinya Kato wrote:
Hi hackers,
I'd like to propose a new feature for the COPY FROM command to allow
skipping multiple header lines when loading data. This enhancement
would enable files with multi-line headers to be loaded without any
preprocessing, which would significantly improve usability.In real-world scenarios, it's common for data files to contain
multiple header lines, such as file descriptions or column
explanations. Currently, the COPY command cannot load these files
directly, which requires users to preprocess them with tools like sed
or tail.Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'",
some environments do not have the tail command available.
Additionally, this approach requires superuser privileges or
membership in the pg_execute_server_program role.This feature also has precedent in other major RDBMS:
- MySQL: LOAD DATA ... IGNORE N LINES [1]
- SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2]
- Oracle SQL*Loader: sqlldr … SKIP=N [3]I have not yet created a patch, but I am willing to implement an
extension for the HEADER option. I would like to discuss the
specification first.The specification I have in mind is as follows:
- Command: COPY FROM
- Formats: text and csv
- Option syntax: HEADER [ boolean | integer | MATCH] (Extend the
HEADER option to accept an integer value in addition to the existing
boolean and MATCH keywords.)
- Behavior: Let N be the specified integer.
- If N < 0, raise an error.
- If N = 0 or 1, same behavior when boolean is specified.
- If N > 1, skip the first N rows.Thoughts?
I generally like the idea.
However, a similar proposal was made earlier [1]/messages/by-id/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com, and seemingly
some hackers weren't in favor of it. It's probably worth reading
that thread to understand the previous concerns.
Regards,
[1]: /messages/by-id/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com
--
Fujii Masao
NTT DATA Japan Corporation
2025年6月9日(月) 17:27 Fujii Masao <masao.fujii@oss.nttdata.com>:
I generally like the idea.
Thanks.
However, a similar proposal was made earlier [1], and seemingly
some hackers weren't in favor of it. It's probably worth reading
that thread to understand the previous concerns.[1] /messages/by-id/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com
Oh, I missed it. I will check it soon.
--
Best regards,
Shinya Kato
NTT OSS Center
On 2025-06-09 Mo 4:27 AM, Fujii Masao wrote:
On 2025/06/09 16:10, Shinya Kato wrote:
Hi hackers,
I'd like to propose a new feature for the COPY FROM command to allow
skipping multiple header lines when loading data. This enhancement
would enable files with multi-line headers to be loaded without any
preprocessing, which would significantly improve usability.In real-world scenarios, it's common for data files to contain
multiple header lines, such as file descriptions or column
explanations. Currently, the COPY command cannot load these files
directly, which requires users to preprocess them with tools like sed
or tail.Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'",
some environments do not have the tail command available.
Additionally, this approach requires superuser privileges or
membership in the pg_execute_server_program role.This feature also has precedent in other major RDBMS:
- MySQL: LOAD DATA ... IGNORE N LINES [1]
- SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2]
- Oracle SQL*Loader: sqlldr … SKIP=N [3]I have not yet created a patch, but I am willing to implement an
extension for the HEADER option. I would like to discuss the
specification first.The specification I have in mind is as follows:
- Command: COPY FROM
- Formats: text and csv
- Option syntax: HEADER [ boolean | integer | MATCH] (Extend the
HEADER option to accept an integer value in addition to the existing
boolean and MATCH keywords.)
- Behavior: Let N be the specified integer.
- If N < 0, raise an error.
- If N = 0 or 1, same behavior when boolean is specified.
- If N > 1, skip the first N rows.Thoughts?
I generally like the idea.
However, a similar proposal was made earlier [1], and seemingly
some hackers weren't in favor of it. It's probably worth reading
that thread to understand the previous concerns.Regards,
[1]
/messages/by-id/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com
I think the earlier proposal went rather further than this one, which I
suspect can be implemented fairly cheaply.
I don't have terribly strong feelings about it, but matching a feature
implemented elsewhere has some attraction if it can be done easily.
OTOH I'm a bit curious to know what software produces multi-line CSV
headers.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
However, a similar proposal was made earlier [1], and seemingly
some hackers weren't in favor of it. It's probably worth reading
that thread to understand the previous concerns.[1] /messages/by-id/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com
Oh, I missed it. I will check it soon.
I read it.
There are clear differences from the earlier proposal. My sole
motivation is to skip multiple headers, and I don't believe a feature
for skipping footers is necessary. To be clear, I think it's best to
simply extend the current HEADER option.
Regarding the concern about adding ETL-like functionality, this
feature is already implemented in other RDBMSs, which is why I believe
it is also necessary for PostgreSQL.
Honestly, I haven't implemented it yet, so I'm not sure about the
performance. However, I don't expect it to be significantly different
from the current HEADER option that skips a single line.
I think the earlier proposal went rather further than this one, which I
suspect can be implemented fairly cheaply.
That's probably it, exactly.
I don't have terribly strong feelings about it, but matching a feature
implemented elsewhere has some attraction if it can be done easily.OTOH I'm a bit curious to know what software produces multi-line CSV
headers.
Both Pandas and R can create CSV files with multi-line headers
(although I don't personally think this is desirable). Furthermore,
various systems sometimes generate reports as CSV files that
unexpectedly contain multiple header lines.
--
Best regards,
Shinya Kato
NTT OSS Center
On 2025/06/10 9:43, Shinya Kato wrote:
However, a similar proposal was made earlier [1], and seemingly
some hackers weren't in favor of it. It's probably worth reading
that thread to understand the previous concerns.[1] /messages/by-id/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com
Oh, I missed it. I will check it soon.
I read it.
There are clear differences from the earlier proposal. My sole
motivation is to skip multiple headers, and I don't believe a feature
for skipping footers is necessary. To be clear, I think it's best to
simply extend the current HEADER option.
Sounds ok to me.
Regarding the concern about adding ETL-like functionality, this
feature is already implemented in other RDBMSs, which is why I believe
it is also necessary for PostgreSQL.Honestly, I haven't implemented it yet, so I'm not sure about the
performance. However, I don't expect it to be significantly different
from the current HEADER option that skips a single line.
So it seems better for you to implement the patch at first and then
check the performance overhead etc if necessary.
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On Tue, Jun 10, 2025 at 2:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
There are clear differences from the earlier proposal. My sole
motivation is to skip multiple headers, and I don't believe a feature
for skipping footers is necessary. To be clear, I think it's best to
simply extend the current HEADER option.Sounds ok to me.
Thank you.
Regarding the concern about adding ETL-like functionality, this
feature is already implemented in other RDBMSs, which is why I believe
it is also necessary for PostgreSQL.Honestly, I haven't implemented it yet, so I'm not sure about the
performance. However, I don't expect it to be significantly different
from the current HEADER option that skips a single line.So it seems better for you to implement the patch at first and then
check the performance overhead etc if necessary.
Thank you for your advice. I will create a patch.
--
Best regards,
Shinya Kato
NTT OSS Center
On Tue, Jun 10, 2025 at 2:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2025/06/10 9:43, Shinya Kato wrote:
However, a similar proposal was made earlier [1], and seemingly
some hackers weren't in favor of it. It's probably worth reading
that thread to understand the previous concerns.[1] /messages/by-id/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com
Oh, I missed it. I will check it soon.
I read it.
There are clear differences from the earlier proposal. My sole
motivation is to skip multiple headers, and I don't believe a feature
for skipping footers is necessary. To be clear, I think it's best to
simply extend the current HEADER option.Sounds ok to me.
Regarding the concern about adding ETL-like functionality, this
feature is already implemented in other RDBMSs, which is why I believe
it is also necessary for PostgreSQL.Honestly, I haven't implemented it yet, so I'm not sure about the
performance. However, I don't expect it to be significantly different
from the current HEADER option that skips a single line.So it seems better for you to implement the patch at first and then
check the performance overhead etc if necessary.Regards,
--
Fujii Masao
NTT DATA Japan Corporation
--
Best regards,
Shinya Kato
NTT OSS Center
Andrew Dunstan <andrew@dunslane.net> writes:
OTOH I'm a bit curious to know what software produces multi-line CSV
headers.
AWS CloudFront access logs are stored in S3 as TSV files (one per hour
per CF node) with a two-line header comment where the first line is the
version and the second lists the fields (but not in a form useful for
HEADER MATCH).
Although not useful for the above format, and not intended to derail or
bloat the proposal in this thread, would it be useful to have a mode
that combines skip and match? I.e. skip N lines, then check the fields
in the one after that against the target columns.
- ilmari
On Tue, Jun 10, 2025 at 7:05 PM Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
OTOH I'm a bit curious to know what software produces multi-line CSV
headers.AWS CloudFront access logs are stored in S3 as TSV files (one per hour
per CF node) with a two-line header comment where the first line is the
version and the second lists the fields (but not in a form useful for
HEADER MATCH).
Thank you for providing that example.
Although not useful for the above format, and not intended to derail or
bloat the proposal in this thread, would it be useful to have a mode
that combines skip and match? I.e. skip N lines, then check the fields
in the one after that against the target columns.
I think it would be useful, but the target columns are not always at the
bottom of the header. For example, the target columns could be on the first
line, with explanations or sub-columns on the lines that follow.
Considering this, the patch would become too complicated, so I'd like to
keep this out of scope. What do you think?
--
Best regards,
Shinya Kato
NTT OSS Center
Shinya Kato <shinya11.kato@gmail.com> writes:
On Tue, Jun 10, 2025 at 7:05 PM Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
wrote:Although not useful for the above format, and not intended to derail or
bloat the proposal in this thread, would it be useful to have a mode
that combines skip and match? I.e. skip N lines, then check the fields
in the one after that against the target columns.I think it would be useful, but the target columns are not always at the
bottom of the header. For example, the target columns could be on the first
line, with explanations or sub-columns on the lines that follow.Considering this, the patch would become too complicated, so I'd like to
keep this out of scope. What do you think?
Yeah, that's a valid point, and as I said I don't want to bloat or
derail this proposal, so keeping it simple is fine.
- ilmari
So it seems better for you to implement the patch at first and then
check the performance overhead etc if necessary.Thank you for your advice. I will create a patch.
I created a patch.
As you can see from the patch, I believe the performance impact is
negligible. The only changes were to modify how the HEADER option is stored
and to add a loop that skips the specified number of header lines when
parsing the options.
The design is such that if a HEADER value larger than the number of lines
in the file is specified, the command will complete with zero rows loaded
and will not return an error. This is the same behavior as specifying
HEADER true for a CSV file that has zero rows.
And I will add this patch for the next CF.
Thoughts?
--
Best regards,
Shinya Kato
NTT OSS Center
Attachments:
v1-0001-Add-support-for-multi-line-header-skipping-in-COP.patchapplication/octet-stream; name=v1-0001-Add-support-for-multi-line-header-skipping-in-COP.patchDownload
From 35d6ed6248bf66b2349f31b05d82bc4b07d5cde0 Mon Sep 17 00:00:00 2001
From: Shinya Kato <shinya11.kato@gmail.com>
Date: Thu, 26 Jun 2025 14:11:41 +0900
Subject: [PATCH v1] Add support for multi-line header skipping in COPY The
HEADER option for COPY now accepts a non-negative integer value, allowing
users to skip an arbitrary number of header lines when importing data with
COPY FROM. For COPY TO, only 0 or 1 is allowed.
---
doc/src/sgml/ref/copy.sgml | 5 +-
src/backend/commands/copy.c | 81 ++++++++++++++++++++--------
src/backend/commands/copyfromparse.c | 11 ++--
src/backend/commands/copyto.c | 2 +-
src/include/commands/copy.h | 15 +++---
src/test/regress/expected/copy.out | 25 ++++++++-
src/test/regress/expected/copy2.out | 6 +++
src/test/regress/sql/copy.sql | 30 +++++++++++
src/test/regress/sql/copy2.sql | 3 ++
src/tools/pgindent/typedefs.list | 2 +-
10 files changed, 141 insertions(+), 39 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..f858e73358c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,7 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
DEFAULT '<replaceable class="parameter">default_string</replaceable>'
- HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
+ HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -307,6 +307,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
column in the file. On output, the first line contains the column
names from the table. On input, the first line is discarded when this
option is set to <literal>true</literal> (or equivalent Boolean value).
+ If this option is set to a non-negative integer value, specified number of
+ lines are discarded. An integer value greater than 1 is only valid for
+ <command>COPY FROM</command> commands.
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..7e5ef6bb836 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -322,33 +322,49 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
}
/*
- * Extract a CopyHeaderChoice value from a DefElem. This is like
- * defGetBoolean() but also accepts the special value "match".
+ * Extract a CopyHeaderOption value from a DefElem.
+ *
+ * Parses the HEADER option for COPY. Accepts boolean values, integer values
+ * (number of lines to skip, via skip_lines), or the special value "match".
+ *
+ * For COPY TO, boolean values and skip_lines 0 or 1 are allowed.
+ * For COPY FROM, boolean values, skip_lines >= 0, and "match" are allowed.
*/
-static CopyHeaderChoice
-defGetCopyHeaderChoice(DefElem *def, bool is_from)
+static CopyHeaderOption
+defGetCopyHeaderOption(DefElem *def, bool is_from)
{
+ CopyHeaderOption option;
+
/*
* If no parameter value given, assume "true" is meant.
*/
if (def->arg == NULL)
- return COPY_HEADER_TRUE;
+ {
+ option.match = false;
+ option.skip_lines = 1;
+ return option;
+ }
/*
- * Allow 0, 1, "true", "false", "on", "off", or "match".
+ * Allow integer value, "true", "false", "on", "off", or "match".
*/
switch (nodeTag(def->arg))
{
case T_Integer:
- switch (intVal(def->arg))
{
- case 0:
- return COPY_HEADER_FALSE;
- case 1:
- return COPY_HEADER_TRUE;
- default:
- /* otherwise, error out below */
- break;
+ int skip_lines = intVal(def->arg);
+ if (skip_lines < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("HEADER must be non-negative integer")));
+ if (!is_from && skip_lines > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use multi-line header in COPY TO")));
+
+ option.match = false;
+ option.skip_lines = skip_lines;
+ return option;
}
break;
default:
@@ -360,13 +376,29 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
* grammar's opt_boolean_or_string production.
*/
if (pg_strcasecmp(sval, "true") == 0)
- return COPY_HEADER_TRUE;
+ {
+ option.match = false;
+ option.skip_lines = 1;
+ return option;
+ }
if (pg_strcasecmp(sval, "false") == 0)
- return COPY_HEADER_FALSE;
+ {
+ option.match = false;
+ option.skip_lines = 0;
+ return option;
+ }
if (pg_strcasecmp(sval, "on") == 0)
- return COPY_HEADER_TRUE;
+ {
+ option.match = false;
+ option.skip_lines = 1;
+ return option;
+ }
if (pg_strcasecmp(sval, "off") == 0)
- return COPY_HEADER_FALSE;
+ {
+ option.match = false;
+ option.skip_lines = 0;
+ return option;
+ }
if (pg_strcasecmp(sval, "match") == 0)
{
if (!is_from)
@@ -374,16 +406,19 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot use \"%s\" with HEADER in COPY TO",
sval)));
- return COPY_HEADER_MATCH;
+
+ option.match = true;
+ option.skip_lines = 1;
+ return option;
}
}
break;
}
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("%s requires a Boolean value or \"match\"",
+ errmsg("%s requires a Boolean value, integer value, or \"match\"",
def->defname)));
- return COPY_HEADER_FALSE; /* keep compiler quiet */
+ return option; /* keep compiler quiet */
}
/*
@@ -566,7 +601,7 @@ ProcessCopyOptions(ParseState *pstate,
if (header_specified)
errorConflictingDefElem(defel, pstate);
header_specified = true;
- opts_out->header_line = defGetCopyHeaderChoice(defel, is_from);
+ opts_out->header = defGetCopyHeaderOption(defel, is_from);
}
else if (strcmp(defel->defname, "quote") == 0)
{
@@ -769,7 +804,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->binary && opts_out->header.skip_lines > 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..1edf2af4d98 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -777,17 +777,20 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
Assert(!cstate->opts.binary);
/* on input check that the header line is correct if needed */
- if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+ if (cstate->cur_lineno == 0 && cstate->opts.header.skip_lines)
{
ListCell *cur;
TupleDesc tupDesc;
tupDesc = RelationGetDescr(cstate->rel);
- cstate->cur_lineno++;
- done = CopyReadLine(cstate, is_csv);
+ for (int i = 0; i < cstate->opts.header.skip_lines; i++)
+ {
+ cstate->cur_lineno++;
+ done = CopyReadLine(cstate, is_csv);
+ }
- if (cstate->opts.header_line == COPY_HEADER_MATCH)
+ if (cstate->opts.header.match)
{
int fldnum;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..d897d003a8d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -199,7 +199,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
cstate->file_encoding);
/* if a header has been requested send the line */
- if (cstate->opts.header_line)
+ if (cstate->opts.header.skip_lines == 1)
{
ListCell *cur;
bool hdr_delim = false;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..65b5ecc6af8 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,15 +20,14 @@
#include "tcop/dest.h"
/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
+ * Represents how many lines to skip, and whether it must match the actual names
+ * (which implies "true").
*/
-typedef enum CopyHeaderChoice
+typedef struct CopyHeaderOption
{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
+ bool match; /* header line must match actual names? */
+ int skip_lines; /* number of lines to skip before data */
+} CopyHeaderOption;
/*
* Represents where to save input processing errors. More values to be added
@@ -64,7 +63,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
- CopyHeaderChoice header_line; /* header line? */
+ CopyHeaderOption header; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
char *null_print_client; /* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..03b363eaf34 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -81,6 +81,29 @@ copy copytest4 to stdout (header);
c1 colname with tab: \t
1 a
2 b
+-- test multi-line header line feature
+create temp table copytest5 (c1 int);
+copy copytest5 from stdin (format csv, header 2);
+copy copytest5 to stdout (header);
+c1
+1
+2
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
@@ -224,7 +247,7 @@ alter table header_copytest add column c text;
copy header_copytest to stdout with (header match);
ERROR: cannot use "match" with HEADER in COPY TO
copy header_copytest from stdin with (header wrong_choice);
-ERROR: header requires a Boolean value or "match"
+ERROR: header requires a Boolean value, integer value, or "match"
-- works
copy header_copytest from stdin with (header match);
copy header_copytest (c, a, b) from stdin with (header match);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..4c957559477 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -132,6 +132,12 @@ COPY x from stdin with (reject_limit 1);
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
+COPY x from stdin with (header -1);
+ERROR: HEADER must be non-negative integer
+COPY x from stdin with (header 2.5);
+ERROR: header requires a Boolean value, integer value, or "match"
+COPY x to stdout with (header 2);
+ERROR: cannot use multi-line header in COPY TO
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..a1316c73bac 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -94,6 +94,36 @@ this is just a line full of junk that would error out if parsed
copy copytest4 to stdout (header);
+-- test multi-line header line feature
+
+create temp table copytest5 (c1 int);
+
+copy copytest5 from stdin (format csv, header 2);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+copy copytest5 to stdout (header);
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..cef45868db5 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -90,6 +90,9 @@ COPY x to stdout (format BINARY, on_error unsupported);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
+COPY x from stdin with (header -1);
+COPY x from stdin with (header 2.5);
+COPY x to stdout with (header 2);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..49a76fac9ad 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -521,7 +521,7 @@ CopyFormatOptions
CopyFromRoutine
CopyFromState
CopyFromStateData
-CopyHeaderChoice
+CopyHeaderOption
CopyInsertMethod
CopyLogVerbosityChoice
CopyMethod
--
2.39.3
On 2025/06/26 14:35, Shinya Kato wrote:
So it seems better for you to implement the patch at first and then
check the performance overhead etc if necessary.Thank you for your advice. I will create a patch.
I created a patch.
Thanks for the patch!
As you can see from the patch, I believe the performance impact is negligible. The only changes were to modify how the HEADER option is stored and to add a loop that skips the specified number of header lines when parsing the options.
The design is such that if a HEADER value larger than the number of lines in the file is specified, the command will complete with zero rows loaded and will not return an error. This is the same behavior as specifying HEADER true for a CSV file that has zero rows.
Sounds good to me.
And I will add this patch for the next CF.
Thoughts?
When I compiled with the patch applied, I got the following warning:
copyfromparse.c:834:20: error: ‘done’ may be used uninitialized [-Werror=maybe-uninitialized]
834 | if (done)
| ^
+ lines are discarded. An integer value greater than 1 is only valid for
+ <command>COPY FROM</command> commands.
This might be slightly misleading since 0 is also a valid value for COPY FROM.
+ for (int i = 0; i < cstate->opts.header.skip_lines; i++)
+ {
+ cstate->cur_lineno++;
+ done = CopyReadLine(cstate, is_csv);
+ }
If "done" is true, shouldn't we break out of the loop immediately?
Otherwise, with a large HEADER value, we could end up wasting a lot of
cycles unnecessarily.
+defGetCopyHeaderOption(DefElem *def, bool is_from)
{
+ CopyHeaderOption option;
To avoid repeating the same code like "option.match = false" in every case,
how about initializing the struct like "option = {false, 1}"?
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("HEADER must be non-negative integer")));
This message could be confusing since HEADER can also accept a Boolean or "match".
Maybe it's better to use the same message as "%s requires a Boolean value, integer value,
or \"match\"",? "integer value"? If so, it's also better to use "non-negative integer"
instead of just "integer".
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On Thu, Jun 26, 2025 at 4:36 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
On 2025/06/26 14:35, Shinya Kato wrote:
So it seems better for you to implement the patch at first and then
check the performance overhead etc if necessary.Thank you for your advice. I will create a patch.
I created a patch.
Thanks for the patch!
Thank you for your review.
When I compiled with the patch applied, I got the following warning:
copyfromparse.c:834:20: error: ‘done’ may be used uninitialized
[-Werror=maybe-uninitialized]
834 | if (done)
| ^
Oh, sorry. I missed it.
I fixed it to initialize done = false.
+ lines are discarded. An integer value greater than 1 is only valid for + <command>COPY FROM</command> commands.This might be slightly misleading since 0 is also a valid value for COPY
FROM.
That's certainly true. I fixed it below:
+ lines are discarded. An integer value greater than 1 is not allowed
for
+ <command>COPY TO</command> commands.
+ for (int i = 0; i < cstate->opts.header.skip_lines; i++) + { + cstate->cur_lineno++; + done = CopyReadLine(cstate, is_csv); + }If "done" is true, shouldn't we break out of the loop immediately?
Otherwise, with a large HEADER value, we could end up wasting a lot of
cycles unnecessarily.
Exactly, fixed.
+defGetCopyHeaderOption(DefElem *def, bool is_from) { + CopyHeaderOption option;To avoid repeating the same code like "option.match = false" in every case,
how about initializing the struct like "option = {false, 1}"?
Exactly, fixed.
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("HEADER must be non-negative integer")));This message could be confusing since HEADER can also accept a Boolean or
"match".
Maybe it's better to use the same message as "%s requires a Boolean value,
integer value,
or \"match\"",? "integer value"? If so, it's also better to use
"non-negative integer"
instead of just "integer".
Yes, I fixed it to use the same message which replaced "integer" to
"non-negative integer".
Original error message "%s requires a Boolean value, integer value, or
\"match\"" should also be fixed to "non-negative integer"?
--
Best regards,
Shinya Kato
NTT OSS Center
Attachments:
v2-0001-Add-support-for-multi-line-header-skipping-in-COP.patchapplication/octet-stream; name=v2-0001-Add-support-for-multi-line-header-skipping-in-COP.patchDownload
From ec7e2c671def498f894afb5d4e6467b7dff46e2b Mon Sep 17 00:00:00 2001
From: Shinya Kato <shinya11.kato@gmail.com>
Date: Thu, 26 Jun 2025 19:09:31 +0900
Subject: [PATCH v2] Add support for multi-line header skipping in COPY The
HEADER option for COPY now accepts a non-negative integer value, allowing
users to skip an arbitrary number of header lines when importing data with
COPY FROM. For COPY TO, only 0 or 1 is allowed.
---
doc/src/sgml/ref/copy.sgml | 5 ++-
src/backend/commands/copy.c | 67 ++++++++++++++++++----------
src/backend/commands/copyfromparse.c | 15 ++++---
src/backend/commands/copyto.c | 2 +-
src/include/commands/copy.h | 15 +++----
src/test/regress/expected/copy.out | 25 ++++++++++-
src/test/regress/expected/copy2.out | 6 +++
src/test/regress/sql/copy.sql | 30 +++++++++++++
src/test/regress/sql/copy2.sql | 3 ++
src/tools/pgindent/typedefs.list | 2 +-
10 files changed, 130 insertions(+), 40 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..1869d74cdf1 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,7 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
DEFAULT '<replaceable class="parameter">default_string</replaceable>'
- HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
+ HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -307,6 +307,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
column in the file. On output, the first line contains the column
names from the table. On input, the first line is discarded when this
option is set to <literal>true</literal> (or equivalent Boolean value).
+ If this option is set to a non-negative integer value, specified number of
+ lines are discarded. An integer value greater than 1 is not allowed for
+ <command>COPY TO</command> commands.
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..d0150ffee4c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -322,33 +322,46 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
}
/*
- * Extract a CopyHeaderChoice value from a DefElem. This is like
- * defGetBoolean() but also accepts the special value "match".
+ * Extract a CopyHeaderOption value from a DefElem.
+ *
+ * Parses the HEADER option for COPY. Accepts boolean values, integer values
+ * (number of lines to skip, via skip_lines), or the special value "match".
+ *
+ * For COPY TO, boolean values and skip_lines 0 or 1 are allowed.
+ * For COPY FROM, boolean values, skip_lines >= 0, and "match" are allowed.
*/
-static CopyHeaderChoice
-defGetCopyHeaderChoice(DefElem *def, bool is_from)
+static CopyHeaderOption
+defGetCopyHeaderOption(DefElem *def, bool is_from)
{
+ CopyHeaderOption option = {false, 1};
+
/*
* If no parameter value given, assume "true" is meant.
*/
if (def->arg == NULL)
- return COPY_HEADER_TRUE;
+ return option;
/*
- * Allow 0, 1, "true", "false", "on", "off", or "match".
+ * Allow integer value, "true", "false", "on", "off", or "match".
*/
switch (nodeTag(def->arg))
{
case T_Integer:
- switch (intVal(def->arg))
{
- case 0:
- return COPY_HEADER_FALSE;
- case 1:
- return COPY_HEADER_TRUE;
- default:
- /* otherwise, error out below */
- break;
+ int skip_lines = intVal(def->arg);
+ if (skip_lines < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("%s requires a Boolean value, non-negative"
+ "integer value, or \"match\"",
+ def->defname)));
+ if (!is_from && skip_lines > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use multi-line header in COPY TO")));
+
+ option.skip_lines = skip_lines;
+ return option;
}
break;
default:
@@ -360,13 +373,19 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
* grammar's opt_boolean_or_string production.
*/
if (pg_strcasecmp(sval, "true") == 0)
- return COPY_HEADER_TRUE;
+ return option;
if (pg_strcasecmp(sval, "false") == 0)
- return COPY_HEADER_FALSE;
+ {
+ option.skip_lines = 0;
+ return option;
+ }
if (pg_strcasecmp(sval, "on") == 0)
- return COPY_HEADER_TRUE;
+ return option;
if (pg_strcasecmp(sval, "off") == 0)
- return COPY_HEADER_FALSE;
+ {
+ option.skip_lines = 0;
+ return option;
+ }
if (pg_strcasecmp(sval, "match") == 0)
{
if (!is_from)
@@ -374,16 +393,18 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot use \"%s\" with HEADER in COPY TO",
sval)));
- return COPY_HEADER_MATCH;
+
+ option.match = true;
+ return option;
}
}
break;
}
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("%s requires a Boolean value or \"match\"",
+ errmsg("%s requires a Boolean value, integer value, or \"match\"",
def->defname)));
- return COPY_HEADER_FALSE; /* keep compiler quiet */
+ return option; /* keep compiler quiet */
}
/*
@@ -566,7 +587,7 @@ ProcessCopyOptions(ParseState *pstate,
if (header_specified)
errorConflictingDefElem(defel, pstate);
header_specified = true;
- opts_out->header_line = defGetCopyHeaderChoice(defel, is_from);
+ opts_out->header = defGetCopyHeaderOption(defel, is_from);
}
else if (strcmp(defel->defname, "quote") == 0)
{
@@ -769,7 +790,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->binary && opts_out->header.skip_lines > 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..912a2030cfa 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -771,23 +771,28 @@ static pg_attribute_always_inline bool
NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
{
int fldct;
- bool done;
+ bool done = false;
/* only available for text or csv input */
Assert(!cstate->opts.binary);
/* on input check that the header line is correct if needed */
- if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+ if (cstate->cur_lineno == 0 && cstate->opts.header.skip_lines)
{
ListCell *cur;
TupleDesc tupDesc;
tupDesc = RelationGetDescr(cstate->rel);
- cstate->cur_lineno++;
- done = CopyReadLine(cstate, is_csv);
+ for (int i = 0; i < cstate->opts.header.skip_lines; i++)
+ {
+ cstate->cur_lineno++;
+ done = CopyReadLine(cstate, is_csv);
+ if (done)
+ break;
+ }
- if (cstate->opts.header_line == COPY_HEADER_MATCH)
+ if (cstate->opts.header.match)
{
int fldnum;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..d897d003a8d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -199,7 +199,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
cstate->file_encoding);
/* if a header has been requested send the line */
- if (cstate->opts.header_line)
+ if (cstate->opts.header.skip_lines == 1)
{
ListCell *cur;
bool hdr_delim = false;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..65b5ecc6af8 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,15 +20,14 @@
#include "tcop/dest.h"
/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
+ * Represents how many lines to skip, and whether it must match the actual names
+ * (which implies "true").
*/
-typedef enum CopyHeaderChoice
+typedef struct CopyHeaderOption
{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
+ bool match; /* header line must match actual names? */
+ int skip_lines; /* number of lines to skip before data */
+} CopyHeaderOption;
/*
* Represents where to save input processing errors. More values to be added
@@ -64,7 +63,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
- CopyHeaderChoice header_line; /* header line? */
+ CopyHeaderOption header; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
char *null_print_client; /* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..03b363eaf34 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -81,6 +81,29 @@ copy copytest4 to stdout (header);
c1 colname with tab: \t
1 a
2 b
+-- test multi-line header line feature
+create temp table copytest5 (c1 int);
+copy copytest5 from stdin (format csv, header 2);
+copy copytest5 to stdout (header);
+c1
+1
+2
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
@@ -224,7 +247,7 @@ alter table header_copytest add column c text;
copy header_copytest to stdout with (header match);
ERROR: cannot use "match" with HEADER in COPY TO
copy header_copytest from stdin with (header wrong_choice);
-ERROR: header requires a Boolean value or "match"
+ERROR: header requires a Boolean value, integer value, or "match"
-- works
copy header_copytest from stdin with (header match);
copy header_copytest (c, a, b) from stdin with (header match);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..e514110b13d 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -132,6 +132,12 @@ COPY x from stdin with (reject_limit 1);
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
+COPY x from stdin with (header -1);
+ERROR: header requires a Boolean value, non-negativeinteger value, or "match"
+COPY x from stdin with (header 2.5);
+ERROR: header requires a Boolean value, integer value, or "match"
+COPY x to stdout with (header 2);
+ERROR: cannot use multi-line header in COPY TO
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..a1316c73bac 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -94,6 +94,36 @@ this is just a line full of junk that would error out if parsed
copy copytest4 to stdout (header);
+-- test multi-line header line feature
+
+create temp table copytest5 (c1 int);
+
+copy copytest5 from stdin (format csv, header 2);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+copy copytest5 to stdout (header);
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..cef45868db5 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -90,6 +90,9 @@ COPY x to stdout (format BINARY, on_error unsupported);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
+COPY x from stdin with (header -1);
+COPY x from stdin with (header 2.5);
+COPY x to stdout with (header 2);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..49a76fac9ad 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -521,7 +521,7 @@ CopyFormatOptions
CopyFromRoutine
CopyFromState
CopyFromStateData
-CopyHeaderChoice
+CopyHeaderOption
CopyInsertMethod
CopyLogVerbosityChoice
CopyMethod
--
2.39.3
On 2025/06/26 19:12, Shinya Kato wrote:
On Thu, Jun 26, 2025 at 4:36 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
On 2025/06/26 14:35, Shinya Kato wrote:
> > So it seems better for you to implement the patch at first and then
> > check the performance overhead etc if necessary.
>
> Thank you for your advice. I will create a patch.I created a patch.
Thanks for the patch!
Thank you for your review.
When I compiled with the patch applied, I got the following warning:
copyfromparse.c:834:20: error: ‘done’ may be used uninitialized [-Werror=maybe-uninitialized]
834 | if (done)
| ^Oh, sorry. I missed it.
I fixed it to initialize done = false.+ lines are discarded. An integer value greater than 1 is only valid for + <command>COPY FROM</command> commands.This might be slightly misleading since 0 is also a valid value for COPY FROM.
That's certainly true. I fixed it below: + lines are discarded. An integer value greater than 1 is not allowed for + <command>COPY TO</command> commands.
Thanks for fixing that! However, it's odd that the description for COPY TO appears
under the section starting with "On input." Shouldn't it be under the "On output"
section instead?
Also, I think the entire HEADER option section could use a clearer structure.
How about revising it like this?
---------------------
<listitem>
<para>
- Specifies that the file contains a header line with the names of each
- column in the file. On output, the first line contains the column
- names from the table. On input, the first line is discarded when this
- option is set to <literal>true</literal> (or equivalent Boolean value).
- If this option is set to <literal>MATCH</literal>, the number and names
- of the columns in the header line must match the actual column names of
- the table, in order; otherwise an error is raised.
+ On output, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the output will
+ contain the column names from the table.
+ Integer values <literal>0</literal> and <literal>1</literal> are
+ accepted as Boolean values, but other integers are not allowed for
+ <command>COPY TO</command> commands.
+ </para>
+ <para>
+ On input, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the input is
+ discarded. If it is set to a non-negative integer, that number of
+ lines are discarded. If the option is set to <literal>MATCH</literal>,
+ the number and names of the columns in the header line must exactly
+ match those of the table, in order; otherwise an error is raised.
+ The <literal>MATCH</literal> value is only valid for
+ <command>COPY FROM</command> commands.
+ </para>
+ <para>
This option is not allowed when using <literal>binary</literal> format.
- The <literal>MATCH</literal> option is only valid for <command>COPY
- FROM</command> commands.
</para>
</listitem>
---------------------
Also, similar to the existing "boolean" type explanation in the COPY docs,
we should add a corresponding entry for integer, now that it's accepted by
the HEADER option. The VACUUM docs has a good example of how to phrase this.
Original error message "%s requires a Boolean value, integer value, or \"match\"" should also be fixed to "non-negative integer"?
Yes, I think that the message
"%s requires a Boolean value, a non-negative integer, or the string \"match\""
is clearer and more accurate.
-typedef enum CopyHeaderChoice
+typedef struct CopyHeaderOption
{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
+ bool match; /* header line must match actual names? */
+ int skip_lines; /* number of lines to skip before data */
+} CopyHeaderOption;
<snip>
- CopyHeaderChoice header_line; /* header line? */
+ CopyHeaderOption header; /* header line? */
Wouldn't it be simpler to just use an integer instead of introducing
CopyHeaderOption? We could use 0 for false, 1 for true, n > 1 for skipping lines,
and -1 for MATCH in that integer.
This would simplify the logic in defGetCopyHeaderOption().
I've attached a patch that shows this idea in action. Thought?
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Attachments:
v3-0001-Add-support-for-multi-line-header-skipping-in-COP.patchtext/plain; charset=UTF-8; name=v3-0001-Add-support-for-multi-line-header-skipping-in-COP.patchDownload
From 49bd75c5a3eca6cf03dc7294b402f7fb54f79ff2 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Thu, 26 Jun 2025 20:35:10 +0900
Subject: [PATCH v3] Add support for multi-line header skipping in COPY The
HEADER option for COPY now accepts a non-negative integer value, allowing
users to skip an arbitrary number of header lines when importing data with
COPY FROM. For COPY TO, only 0 or 1 is allowed.
---
doc/src/sgml/ref/copy.sgml | 38 +++++++++++++++++-------
src/backend/commands/copy.c | 43 +++++++++++++++++-----------
src/backend/commands/copyfromparse.c | 17 ++++++++---
src/backend/commands/copyto.c | 2 +-
src/include/commands/copy.h | 16 +++++------
src/test/regress/expected/copy.out | 25 +++++++++++++++-
src/test/regress/expected/copy2.out | 6 ++++
src/test/regress/sql/copy.sql | 30 +++++++++++++++++++
src/test/regress/sql/copy2.sql | 3 ++
src/tools/pgindent/typedefs.list | 1 -
10 files changed, 139 insertions(+), 42 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..0e3182f1601 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,7 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
DEFAULT '<replaceable class="parameter">default_string</replaceable>'
- HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
+ HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -212,6 +212,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><replaceable class="parameter">integer</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a non-negative integer value passed to the selected option.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>FORMAT</literal></term>
<listitem>
@@ -303,16 +312,25 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>HEADER</literal></term>
<listitem>
<para>
- Specifies that the file contains a header line with the names of each
- column in the file. On output, the first line contains the column
- names from the table. On input, the first line is discarded when this
- option is set to <literal>true</literal> (or equivalent Boolean value).
- If this option is set to <literal>MATCH</literal>, the number and names
- of the columns in the header line must match the actual column names of
- the table, in order; otherwise an error is raised.
+ On output, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the output will
+ contain the column names from the table.
+ Integer values <literal>0</literal> and <literal>1</literal> are
+ accepted as Boolean values, but other integers are not allowed for
+ <command>COPY TO</command> commands.
+ </para>
+ <para>
+ On input, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the input is
+ discarded. If it is set to a non-negative integer, that number of
+ lines are discarded. If the option is set to <literal>MATCH</literal>,
+ the number and names of the columns in the header line must exactly
+ match those of the table, in order; otherwise an error is raised.
+ The <literal>MATCH</literal> value is only valid for
+ <command>COPY FROM</command> commands.
+ </para>
+ <para>
This option is not allowed when using <literal>binary</literal> format.
- The <literal>MATCH</literal> option is only valid for <command>COPY
- FROM</command> commands.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..1461f051323 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -322,11 +322,13 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
}
/*
- * Extract a CopyHeaderChoice value from a DefElem. This is like
- * defGetBoolean() but also accepts the special value "match".
+ * Extract the CopyFormatOptions.header_line value from a DefElem.
+ *
+ * Parses the HEADER option for COPY, which can be a boolean, a non-negative
+ * integer (number of lines to skip), or the special value "match".
*/
-static CopyHeaderChoice
-defGetCopyHeaderChoice(DefElem *def, bool is_from)
+static int
+defGetCopyHeaderOption(DefElem *def, bool is_from)
{
/*
* If no parameter value given, assume "true" is meant.
@@ -335,20 +337,28 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
return COPY_HEADER_TRUE;
/*
- * Allow 0, 1, "true", "false", "on", "off", or "match".
+ * Allow 0, 1, "true", "false", "on", "off", a non-negative integer, or
+ * "match".
*/
switch (nodeTag(def->arg))
{
case T_Integer:
- switch (intVal(def->arg))
{
- case 0:
- return COPY_HEADER_FALSE;
- case 1:
- return COPY_HEADER_TRUE;
- default:
- /* otherwise, error out below */
- break;
+ int ival = intVal(def->arg);
+
+ if (ival < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("%s requires a Boolean value, a non-negative "
+ "integer, or the string \"match\"",
+ def->defname)));
+
+ if (!is_from && ival > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use multi-line header in COPY TO")));
+
+ return ival;
}
break;
default:
@@ -381,7 +391,8 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
}
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("%s requires a Boolean value or \"match\"",
+ errmsg("%s requires a Boolean value, a non-negative integer, "
+ "or the string \"match\"",
def->defname)));
return COPY_HEADER_FALSE; /* keep compiler quiet */
}
@@ -566,7 +577,7 @@ ProcessCopyOptions(ParseState *pstate,
if (header_specified)
errorConflictingDefElem(defel, pstate);
header_specified = true;
- opts_out->header_line = defGetCopyHeaderChoice(defel, is_from);
+ opts_out->header_line = defGetCopyHeaderOption(defel, is_from);
}
else if (strcmp(defel->defname, "quote") == 0)
{
@@ -769,7 +780,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..af0794508fe 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -771,21 +771,30 @@ static pg_attribute_always_inline bool
NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
{
int fldct;
- bool done;
+ bool done = false;
/* only available for text or csv input */
Assert(!cstate->opts.binary);
/* on input check that the header line is correct if needed */
- if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+ if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
{
ListCell *cur;
TupleDesc tupDesc;
+ int lines_to_skip = cstate->opts.header_line;
+
+ /* If set to "match", one header line is skipped */
+ if (cstate->opts.header_line == COPY_HEADER_MATCH)
+ lines_to_skip = 1;
tupDesc = RelationGetDescr(cstate->rel);
- cstate->cur_lineno++;
- done = CopyReadLine(cstate, is_csv);
+ for (int i = 0; i < lines_to_skip; i++)
+ {
+ cstate->cur_lineno++;
+ if ((done = CopyReadLine(cstate, is_csv)))
+ break;
+ }
if (cstate->opts.header_line == COPY_HEADER_MATCH)
{
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..67b94b91cae 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -199,7 +199,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
cstate->file_encoding);
/* if a header has been requested send the line */
- if (cstate->opts.header_line)
+ if (cstate->opts.header_line == COPY_HEADER_TRUE)
{
ListCell *cur;
bool hdr_delim = false;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..541176e1980 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,15 +20,12 @@
#include "tcop/dest.h"
/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
+ * Represents whether a header line must match the actual names
+ * (which implies "true"), and whether it should be present.
*/
-typedef enum CopyHeaderChoice
-{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
+#define COPY_HEADER_MATCH -1
+#define COPY_HEADER_FALSE 0
+#define COPY_HEADER_TRUE 1
/*
* Represents where to save input processing errors. More values to be added
@@ -64,7 +61,8 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
- CopyHeaderChoice header_line; /* header line? */
+ int header_line; /* number of lines to skip or COPY_HEADER_XXX
+ * value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
char *null_print_client; /* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..ac66eb55aee 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -81,6 +81,29 @@ copy copytest4 to stdout (header);
c1 colname with tab: \t
1 a
2 b
+-- test multi-line header line feature
+create temp table copytest5 (c1 int);
+copy copytest5 from stdin (format csv, header 2);
+copy copytest5 to stdout (header);
+c1
+1
+2
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
@@ -224,7 +247,7 @@ alter table header_copytest add column c text;
copy header_copytest to stdout with (header match);
ERROR: cannot use "match" with HEADER in COPY TO
copy header_copytest from stdin with (header wrong_choice);
-ERROR: header requires a Boolean value or "match"
+ERROR: header requires a Boolean value, a non-negative integer, or the string "match"
-- works
copy header_copytest from stdin with (header match);
copy header_copytest (c, a, b) from stdin with (header match);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..9177ed537a4 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -132,6 +132,12 @@ COPY x from stdin with (reject_limit 1);
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
+COPY x from stdin with (header -1);
+ERROR: header requires a Boolean value, a non-negative integer, or the string "match"
+COPY x from stdin with (header 2.5);
+ERROR: header requires a Boolean value, a non-negative integer, or the string "match"
+COPY x to stdout with (header 2);
+ERROR: cannot use multi-line header in COPY TO
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..a1316c73bac 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -94,6 +94,36 @@ this is just a line full of junk that would error out if parsed
copy copytest4 to stdout (header);
+-- test multi-line header line feature
+
+create temp table copytest5 (c1 int);
+
+copy copytest5 from stdin (format csv, header 2);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+copy copytest5 to stdout (header);
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..cef45868db5 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -90,6 +90,9 @@ COPY x to stdout (format BINARY, on_error unsupported);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
+COPY x from stdin with (header -1);
+COPY x from stdin with (header 2.5);
+COPY x to stdout with (header 2);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..beef5f20b60 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -521,7 +521,6 @@ CopyFormatOptions
CopyFromRoutine
CopyFromState
CopyFromStateData
-CopyHeaderChoice
CopyInsertMethod
CopyLogVerbosityChoice
CopyMethod
--
2.49.0
On Fri, Jun 27, 2025 at 12:03 AM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
On 2025/06/26 19:12, Shinya Kato wrote:
On Thu, Jun 26, 2025 at 4:36 PM Fujii Masao <masao.fujii@oss.nttdata.com
<mailto:masao.fujii@oss.nttdata.com>> wrote:
On 2025/06/26 14:35, Shinya Kato wrote:
So it seems better for you to implement the patch at first
and then
check the performance overhead etc if necessary.
Thank you for your advice. I will create a patch.
I created a patch.
Thanks for the patch!
Thank you for your review.
When I compiled with the patch applied, I got the following warning:
copyfromparse.c:834:20: error: ‘done’ may be used uninitialized
[-Werror=maybe-uninitialized]
834 | if (done)
| ^Oh, sorry. I missed it.
I fixed it to initialize done = false.+ lines are discarded. An integer value greater than 1 is only
valid for
+ <command>COPY FROM</command> commands.
This might be slightly misleading since 0 is also a valid value for
COPY FROM.
That's certainly true. I fixed it below:
+ lines are discarded. An integer value greater than 1 is notallowed for
+ <command>COPY TO</command> commands.
Thanks for fixing that! However, it's odd that the description for COPY TO
appears
under the section starting with "On input." Shouldn't it be under the "On
output"
section instead?Also, I think the entire HEADER option section could use a clearer
structure.
How about revising it like this?--------------------- <listitem> <para> - Specifies that the file contains a header line with the names of each - column in the file. On output, the first line contains the column - names from the table. On input, the first line is discarded when this - option is set to <literal>true</literal> (or equivalent Boolean value). - If this option is set to <literal>MATCH</literal>, the number and names - of the columns in the header line must match the actual column names of - the table, in order; otherwise an error is raised. + On output, if this option is set to <literal>true</literal> + (or an equivalent Boolean value), the first line of the output will + contain the column names from the table. + Integer values <literal>0</literal> and <literal>1</literal> are + accepted as Boolean values, but other integers are not allowed for + <command>COPY TO</command> commands. + </para> + <para> + On input, if this option is set to <literal>true</literal> + (or an equivalent Boolean value), the first line of the input is + discarded. If it is set to a non-negative integer, that number of + lines are discarded. If the option is set to <literal>MATCH</literal>, + the number and names of the columns in the header line must exactly + match those of the table, in order; otherwise an error is raised. + The <literal>MATCH</literal> value is only valid for + <command>COPY FROM</command> commands. + </para> + <para> This option is not allowed when using <literal>binary</literal> format. - The <literal>MATCH</literal> option is only valid for <command>COPY - FROM</command> commands. </para> </listitem> ---------------------
Yes, your documentation is clearer than mine.
Also, similar to the existing "boolean" type explanation in the COPY docs,
we should add a corresponding entry for integer, now that it's accepted by
the HEADER option. The VACUUM docs has a good example of how to phrase
this.
Exactly.
Original error message "%s requires a Boolean value, integer value, or
\"match\"" should also be fixed to "non-negative integer"?
Yes, I think that the message
"%s requires a Boolean value, a non-negative integer, or the string
\"match\""is clearer and more accurate.
Thank you, you're right.
-typedef enum CopyHeaderChoice +typedef struct CopyHeaderOption { - COPY_HEADER_FALSE = 0, - COPY_HEADER_TRUE, - COPY_HEADER_MATCH, -} CopyHeaderChoice; + bool match; /* header line must match actual names? */ + int skip_lines; /* number of lines to skip before data */ +} CopyHeaderOption; <snip> - CopyHeaderChoice header_line; /* header line? */ + CopyHeaderOption header; /* header line? */Wouldn't it be simpler to just use an integer instead of introducing
CopyHeaderOption? We could use 0 for false, 1 for true, n > 1 for skipping
lines,
and -1 for MATCH in that integer.This would simplify the logic in defGetCopyHeaderOption().
I've attached a patch that shows this idea in action. Thought?
I thought about the approach, and had a minor concern about the following
inconsistency between the option and its internal implementation:
- When the option is set to "MATCH", header_line becomes -1.
- When the option is set to -1, it's an error.
However, your patch is clear, and it seems we don't need to worry about
this concern.
Your patch looks good to me.
--
Best regards,
Shinya Kato
NTT OSS Center
On 2025/06/27 11:55, Shinya Kato wrote:
I thought about the approach, and had a minor concern about the following inconsistency between the option and its internal implementation:
- When the option is set to "MATCH", header_line becomes -1.
- When the option is set to -1, it's an error.
I think this inconsistency is acceptable, since users won't encounter it directly.
Also the value -1 isn't used explicitly, i.e., it's always referenced through a macro
for "match", which keeps the code readable and understandable.
However, your patch is clear, and it seems we don't need to worry about this concern.
Your patch looks good to me.
Thanks for reviewing the patch! I've marked it as ready for committer.
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On Fri, 27 Jun 2025 12:22:17 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
However, your patch is clear, and it seems we don't need to worry about this concern.
Your patch looks good to me.Thanks for reviewing the patch! I've marked it as ready for committer.
I have a few minor comment on the patch.
+ if (ival < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("%s requires a Boolean value, a non-negative "
+ "integer, or the string \"match\"",
+ def->defname)));
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("%s requires a Boolean value or \"match\"",
+ errmsg("%s requires a Boolean value, a non-negative integer, "
+ "or the string \"match\"",
def->defname)));
These two pieces of code raise the same error, but with different error codes:
one returns ERRCODE_INVALID_PARAMETER_VALUE, while the other returns ERRCODE_SYNTAX_ERROR.
I believe the former is more appropriate, although the existing code uses the
latter. In any case, the error codes should be consistent.
Regarding the documentation, how about explicitly stating that when MATCH is specified, only
the first line is skipped? While this may seem obvious, it’s worth clarifying, as the semantics
of the HEADER option have become a bit more complex with this change.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
On 2025/06/27 13:09, Yugo Nagata wrote:
On Fri, 27 Jun 2025 12:22:17 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:However, your patch is clear, and it seems we don't need to worry about this concern.
Your patch looks good to me.Thanks for reviewing the patch! I've marked it as ready for committer.
I have a few minor comment on the patch.
+ if (ival < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("%s requires a Boolean value, a non-negative " + "integer, or the string \"match\"", + def->defname)));ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("%s requires a Boolean value or \"match\"", + errmsg("%s requires a Boolean value, a non-negative integer, " + "or the string \"match\"", def->defname)));These two pieces of code raise the same error, but with different error codes:
one returns ERRCODE_INVALID_PARAMETER_VALUE, while the other returns ERRCODE_SYNTAX_ERROR.I believe the former is more appropriate, although the existing code uses the
latter. In any case, the error codes should be consistent.
I'm not sure there's an actual rule like "the error code must match
if the error message is the same." But if using the same message with
a different error code is confusing, I'm fine with changing
the earlier message. For example:
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("%s requires a Boolean value, a non-negative "
- "integer, or the string \"match\"",
+ errmsg("a negative integer value cannot be "
+ "specified for %s",
def->defname)));
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On Sat, 28 Jun 2025 00:33:45 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2025/06/27 13:09, Yugo Nagata wrote:
On Fri, 27 Jun 2025 12:22:17 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:However, your patch is clear, and it seems we don't need to worry about this concern.
Your patch looks good to me.Thanks for reviewing the patch! I've marked it as ready for committer.
I have a few minor comment on the patch.
+ if (ival < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("%s requires a Boolean value, a non-negative " + "integer, or the string \"match\"", + def->defname)));ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("%s requires a Boolean value or \"match\"", + errmsg("%s requires a Boolean value, a non-negative integer, " + "or the string \"match\"", def->defname)));These two pieces of code raise the same error, but with different error codes:
one returns ERRCODE_INVALID_PARAMETER_VALUE, while the other returns ERRCODE_SYNTAX_ERROR.I believe the former is more appropriate, although the existing code uses the
latter. In any case, the error codes should be consistent.I'm not sure there's an actual rule like "the error code must match
if the error message is the same." But if using the same message with
a different error code is confusing, I'm fine with changing
the earlier message. For example:(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("%s requires a Boolean value, a non-negative " - "integer, or the string \"match\"", + errmsg("a negative integer value cannot be " + "specified for %s", def->defname)));
Fair point. There might not be any explicit rule.
However, I feel it somewhat confusing.
After looking into the code, it seems that ERRCODE_SYNTAX_ERROR tends to be used
for invalid keywords or parameter types, while ERRCODE_INVALID_PARAMETER_VALUE
is typically used for values that are out of the acceptable range.
So, the proposed change seems reasonable to me.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
On Mon, Jun 30, 2025 at 4:24 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
I have a few minor comment on the patch.
+ if (ival < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("%s requires a Boolean value, a non-negative " + "integer, or the string \"match\"", + def->defname)));ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("%s requires a Boolean value or \"match\"", + errmsg("%s requires a Boolean value, a non-negative integer, " + "or the string \"match\"", def->defname)));These two pieces of code raise the same error, but with different error codes:
one returns ERRCODE_INVALID_PARAMETER_VALUE, while the other returns ERRCODE_SYNTAX_ERROR.I believe the former is more appropriate, although the existing code uses the
latter. In any case, the error codes should be consistent.I'm not sure there's an actual rule like "the error code must match
if the error message is the same." But if using the same message with
a different error code is confusing, I'm fine with changing
the earlier message. For example:(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("%s requires a Boolean value, a non-negative " - "integer, or the string \"match\"", + errmsg("a negative integer value cannot be " + "specified for %s", def->defname)));Fair point. There might not be any explicit rule.
However, I feel it somewhat confusing.After looking into the code, it seems that ERRCODE_SYNTAX_ERROR tends to be used
for invalid keywords or parameter types, while ERRCODE_INVALID_PARAMETER_VALUE
is typically used for values that are out of the acceptable range.So, the proposed change seems reasonable to me.
Thank you for the review. The change looks good to me, too. A new
patch is attached.
Regarding the documentation, how about explicitly stating that when MATCH is specified, only
the first line is skipped? While this may seem obvious, it’s worth clarifying, as the semantics
of the HEADER option have become a bit more complex with this change.
Agreed. I have updated the documentation as follows:
+ lines are discarded. If the option is set to <literal>MATCH</literal>,
+ the number and names of the columns in the header line must exactly
+ match those of the table and, in order, after which the header line is
+ discarded; otherwise an error is raised. The <literal>MATCH</literal>
--
Best regards,
Shinya Kato
NTT OSS Center
Attachments:
v4-0001-Add-support-for-multi-line-header-skipping-in-COP.patchapplication/octet-stream; name=v4-0001-Add-support-for-multi-line-header-skipping-in-COP.patchDownload
From 2e9901a6dc9f75d0a17a506c8da1f0934f3ce055 Mon Sep 17 00:00:00 2001
From: Shinya Kato <shinya11.kato@gmail.com>
Date: Wed, 2 Jul 2025 13:56:41 +0900
Subject: [PATCH v4] Add support for multi-line header skipping in COPY
The HEADER option for COPY now accepts a non-negative integer value,
allowing users to skip an arbitrary number of header lines when
importing data with COPY FROM. For COPY TO, only 0 or 1 is allowed.
---
doc/src/sgml/ref/copy.sgml | 38 ++++++++++++++++++-------
src/backend/commands/copy.c | 42 +++++++++++++++++-----------
src/backend/commands/copyfromparse.c | 17 ++++++++---
src/backend/commands/copyto.c | 2 +-
src/include/commands/copy.h | 16 +++++------
src/test/regress/expected/copy.out | 25 ++++++++++++++++-
src/test/regress/expected/copy2.out | 6 ++++
src/test/regress/sql/copy.sql | 30 ++++++++++++++++++++
src/test/regress/sql/copy2.sql | 3 ++
src/tools/pgindent/typedefs.list | 1 -
10 files changed, 138 insertions(+), 42 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..d814c7a93e9 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,7 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
DEFAULT '<replaceable class="parameter">default_string</replaceable>'
- HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
+ HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -212,6 +212,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><replaceable class="parameter">integer</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a non-negative integer value passed to the selected option.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>FORMAT</literal></term>
<listitem>
@@ -303,16 +312,25 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>HEADER</literal></term>
<listitem>
<para>
- Specifies that the file contains a header line with the names of each
- column in the file. On output, the first line contains the column
- names from the table. On input, the first line is discarded when this
- option is set to <literal>true</literal> (or equivalent Boolean value).
- If this option is set to <literal>MATCH</literal>, the number and names
- of the columns in the header line must match the actual column names of
- the table, in order; otherwise an error is raised.
+ On output, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the output will
+ contain the column names from the table.
+ Integer values <literal>0</literal> and <literal>1</literal> are
+ accepted as Boolean values, but other integers are not allowed for
+ <command>COPY TO</command> commands.
+ </para>
+ <para>
+ On input, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the input is
+ discarded. If it is set to a non-negative integer, that number of
+ lines are discarded. If the option is set to <literal>MATCH</literal>,
+ the number and names of the columns in the header line must exactly
+ match those of the table and, in order, after which the header line is
+ discarded; otherwise an error is raised. The <literal>MATCH</literal>
+ value is only valid for <command>COPY FROM</command> commands.
+ </para>
+ <para>
This option is not allowed when using <literal>binary</literal> format.
- The <literal>MATCH</literal> option is only valid for <command>COPY
- FROM</command> commands.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..fae9c41db65 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -322,11 +322,13 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
}
/*
- * Extract a CopyHeaderChoice value from a DefElem. This is like
- * defGetBoolean() but also accepts the special value "match".
+ * Extract the CopyFormatOptions.header_line value from a DefElem.
+ *
+ * Parses the HEADER option for COPY, which can be a boolean, a non-negative
+ * integer (number of lines to skip), or the special value "match".
*/
-static CopyHeaderChoice
-defGetCopyHeaderChoice(DefElem *def, bool is_from)
+static int
+defGetCopyHeaderOption(DefElem *def, bool is_from)
{
/*
* If no parameter value given, assume "true" is meant.
@@ -335,20 +337,27 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
return COPY_HEADER_TRUE;
/*
- * Allow 0, 1, "true", "false", "on", "off", or "match".
+ * Allow 0, 1, "true", "false", "on", "off", a non-negative integer, or
+ * "match".
*/
switch (nodeTag(def->arg))
{
case T_Integer:
- switch (intVal(def->arg))
{
- case 0:
- return COPY_HEADER_FALSE;
- case 1:
- return COPY_HEADER_TRUE;
- default:
- /* otherwise, error out below */
- break;
+ int ival = intVal(def->arg);
+
+ if (ival < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("a negative integer value cannot be "
+ "specified for %s", def->defname)));
+
+ if (!is_from && ival > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use multi-line header in COPY TO")));
+
+ return ival;
}
break;
default:
@@ -381,7 +390,8 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
}
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("%s requires a Boolean value or \"match\"",
+ errmsg("%s requires a Boolean value, a non-negative integer, "
+ "or the string \"match\"",
def->defname)));
return COPY_HEADER_FALSE; /* keep compiler quiet */
}
@@ -566,7 +576,7 @@ ProcessCopyOptions(ParseState *pstate,
if (header_specified)
errorConflictingDefElem(defel, pstate);
header_specified = true;
- opts_out->header_line = defGetCopyHeaderChoice(defel, is_from);
+ opts_out->header_line = defGetCopyHeaderOption(defel, is_from);
}
else if (strcmp(defel->defname, "quote") == 0)
{
@@ -769,7 +779,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..af0794508fe 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -771,21 +771,30 @@ static pg_attribute_always_inline bool
NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
{
int fldct;
- bool done;
+ bool done = false;
/* only available for text or csv input */
Assert(!cstate->opts.binary);
/* on input check that the header line is correct if needed */
- if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+ if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
{
ListCell *cur;
TupleDesc tupDesc;
+ int lines_to_skip = cstate->opts.header_line;
+
+ /* If set to "match", one header line is skipped */
+ if (cstate->opts.header_line == COPY_HEADER_MATCH)
+ lines_to_skip = 1;
tupDesc = RelationGetDescr(cstate->rel);
- cstate->cur_lineno++;
- done = CopyReadLine(cstate, is_csv);
+ for (int i = 0; i < lines_to_skip; i++)
+ {
+ cstate->cur_lineno++;
+ if ((done = CopyReadLine(cstate, is_csv)))
+ break;
+ }
if (cstate->opts.header_line == COPY_HEADER_MATCH)
{
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..67b94b91cae 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -199,7 +199,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
cstate->file_encoding);
/* if a header has been requested send the line */
- if (cstate->opts.header_line)
+ if (cstate->opts.header_line == COPY_HEADER_TRUE)
{
ListCell *cur;
bool hdr_delim = false;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..541176e1980 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,15 +20,12 @@
#include "tcop/dest.h"
/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
+ * Represents whether a header line must match the actual names
+ * (which implies "true"), and whether it should be present.
*/
-typedef enum CopyHeaderChoice
-{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
+#define COPY_HEADER_MATCH -1
+#define COPY_HEADER_FALSE 0
+#define COPY_HEADER_TRUE 1
/*
* Represents where to save input processing errors. More values to be added
@@ -64,7 +61,8 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
- CopyHeaderChoice header_line; /* header line? */
+ int header_line; /* number of lines to skip or COPY_HEADER_XXX
+ * value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
char *null_print_client; /* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..ac66eb55aee 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -81,6 +81,29 @@ copy copytest4 to stdout (header);
c1 colname with tab: \t
1 a
2 b
+-- test multi-line header line feature
+create temp table copytest5 (c1 int);
+copy copytest5 from stdin (format csv, header 2);
+copy copytest5 to stdout (header);
+c1
+1
+2
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
@@ -224,7 +247,7 @@ alter table header_copytest add column c text;
copy header_copytest to stdout with (header match);
ERROR: cannot use "match" with HEADER in COPY TO
copy header_copytest from stdin with (header wrong_choice);
-ERROR: header requires a Boolean value or "match"
+ERROR: header requires a Boolean value, a non-negative integer, or the string "match"
-- works
copy header_copytest from stdin with (header match);
copy header_copytest (c, a, b) from stdin with (header match);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..caa3c44f0d0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -132,6 +132,12 @@ COPY x from stdin with (reject_limit 1);
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
+COPY x from stdin with (header -1);
+ERROR: a negative integer value cannot be specified for header
+COPY x from stdin with (header 2.5);
+ERROR: header requires a Boolean value, a non-negative integer, or the string "match"
+COPY x to stdout with (header 2);
+ERROR: cannot use multi-line header in COPY TO
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..a1316c73bac 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -94,6 +94,36 @@ this is just a line full of junk that would error out if parsed
copy copytest4 to stdout (header);
+-- test multi-line header line feature
+
+create temp table copytest5 (c1 int);
+
+copy copytest5 from stdin (format csv, header 2);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+copy copytest5 to stdout (header);
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..cef45868db5 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -90,6 +90,9 @@ COPY x to stdout (format BINARY, on_error unsupported);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
+COPY x from stdin with (header -1);
+COPY x from stdin with (header 2.5);
+COPY x to stdout with (header 2);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..beef5f20b60 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -521,7 +521,6 @@ CopyFormatOptions
CopyFromRoutine
CopyFromState
CopyFromStateData
-CopyHeaderChoice
CopyInsertMethod
CopyLogVerbosityChoice
CopyMethod
--
2.39.3
On 2025/07/02 14:39, Shinya Kato wrote:
On Mon, Jun 30, 2025 at 4:24 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
I have a few minor comment on the patch.
+ if (ival < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("%s requires a Boolean value, a non-negative " + "integer, or the string \"match\"", + def->defname)));ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("%s requires a Boolean value or \"match\"", + errmsg("%s requires a Boolean value, a non-negative integer, " + "or the string \"match\"", def->defname)));These two pieces of code raise the same error, but with different error codes:
one returns ERRCODE_INVALID_PARAMETER_VALUE, while the other returns ERRCODE_SYNTAX_ERROR.I believe the former is more appropriate, although the existing code uses the
latter. In any case, the error codes should be consistent.I'm not sure there's an actual rule like "the error code must match
if the error message is the same." But if using the same message with
a different error code is confusing, I'm fine with changing
the earlier message. For example:(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("%s requires a Boolean value, a non-negative " - "integer, or the string \"match\"", + errmsg("a negative integer value cannot be " + "specified for %s", def->defname)));Fair point. There might not be any explicit rule.
However, I feel it somewhat confusing.After looking into the code, it seems that ERRCODE_SYNTAX_ERROR tends to be used
for invalid keywords or parameter types, while ERRCODE_INVALID_PARAMETER_VALUE
is typically used for values that are out of the acceptable range.So, the proposed change seems reasonable to me.
Thank you for the review. The change looks good to me, too. A new
patch is attached.
Thanks for updating the patch!
Regarding the documentation, how about explicitly stating that when MATCH is specified, only
the first line is skipped? While this may seem obvious, it’s worth clarifying, as the semantics
of the HEADER option have become a bit more complex with this change.Agreed. I have updated the documentation as follows:
+ lines are discarded. If the option is set to <literal>MATCH</literal>, + the number and names of the columns in the header line must exactly + match those of the table and, in order, after which the header line is + discarded; otherwise an error is raised. The <literal>MATCH</literal>
How about making the wording a bit clearer? For example:
If set to MATCH, the first line is discarded, and it must contain column names that
exactly match the table's columns, in both number and order; otherwise, an error is raised.
Also, the phrase "if this option is set to..." is repeated three times in the current text.
For the second and third instances, we could simplify it to just "if set to...".
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On Wed, Jul 2, 2025 at 4:48 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Regarding the documentation, how about explicitly stating that when MATCH is specified, only
the first line is skipped? While this may seem obvious, it’s worth clarifying, as the semantics
of the HEADER option have become a bit more complex with this change.Agreed. I have updated the documentation as follows:
+ lines are discarded. If the option is set to <literal>MATCH</literal>, + the number and names of the columns in the header line must exactly + match those of the table and, in order, after which the header line is + discarded; otherwise an error is raised. The <literal>MATCH</literal>How about making the wording a bit clearer? For example:
If set to MATCH, the first line is discarded, and it must contain column names that
exactly match the table's columns, in both number and order; otherwise, an error is raised.
Thank you for the review. I fixed it.
Also, the phrase "if this option is set to..." is repeated three times in the current text.
For the second and third instances, we could simplify it to just "if set to...".
Agreed. However, for the sake of symmetry between "On output" and "On
input" and to maintain clarity between the paragraphs, I have omitted
"this option is" from the "On input" paragraph only.
--
Best regards,
Shinya Kato
NTT OSS Center
Attachments:
v5-0001-Add-support-for-multi-line-header-skipping-in-COP.patchapplication/octet-stream; name=v5-0001-Add-support-for-multi-line-header-skipping-in-COP.patchDownload
From 192ba0f55a945d4a94c4dda9fb188773d47caa70 Mon Sep 17 00:00:00 2001
From: Shinya Kato <shinya11.kato@gmail.com>
Date: Thu, 3 Jul 2025 10:59:34 +0900
Subject: [PATCH v5] Add support for multi-line header skipping in COPY
The HEADER option for COPY now accepts a non-negative integer value,
allowing users to skip an arbitrary number of header lines when
importing data with COPY FROM. For COPY TO, only 0 or 1 is allowed.
---
doc/src/sgml/ref/copy.sgml | 38 ++++++++++++++++++-------
src/backend/commands/copy.c | 42 +++++++++++++++++-----------
src/backend/commands/copyfromparse.c | 17 ++++++++---
src/backend/commands/copyto.c | 2 +-
src/include/commands/copy.h | 16 +++++------
src/test/regress/expected/copy.out | 25 ++++++++++++++++-
src/test/regress/expected/copy2.out | 6 ++++
src/test/regress/sql/copy.sql | 30 ++++++++++++++++++++
src/test/regress/sql/copy2.sql | 3 ++
src/tools/pgindent/typedefs.list | 1 -
10 files changed, 138 insertions(+), 42 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..c2d1fbc1fbe 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,7 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
DEFAULT '<replaceable class="parameter">default_string</replaceable>'
- HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
+ HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -212,6 +212,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><replaceable class="parameter">integer</replaceable></term>
+ <listitem>
+ <para>
+ Specifies a non-negative integer value passed to the selected option.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>FORMAT</literal></term>
<listitem>
@@ -303,16 +312,25 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>HEADER</literal></term>
<listitem>
<para>
- Specifies that the file contains a header line with the names of each
- column in the file. On output, the first line contains the column
- names from the table. On input, the first line is discarded when this
- option is set to <literal>true</literal> (or equivalent Boolean value).
- If this option is set to <literal>MATCH</literal>, the number and names
- of the columns in the header line must match the actual column names of
- the table, in order; otherwise an error is raised.
+ On output, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the output will
+ contain the column names from the table.
+ Integer values <literal>0</literal> and <literal>1</literal> are
+ accepted as Boolean values, but other integers are not allowed for
+ <command>COPY TO</command> commands.
+ </para>
+ <para>
+ On input, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the input is
+ discarded. If set to a non-negative integer, that number of
+ lines are discarded. If set to <literal>MATCH</literal>, the first line
+ is discarded, and it must contain column names that exactly match the
+ table's columns, in both number and order; otherwise, an error is raised.
+ The <literal>MATCH</literal> value is only valid for
+ <command>COPY FROM</command> commands.
+ </para>
+ <para>
This option is not allowed when using <literal>binary</literal> format.
- The <literal>MATCH</literal> option is only valid for <command>COPY
- FROM</command> commands.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..fae9c41db65 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -322,11 +322,13 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
}
/*
- * Extract a CopyHeaderChoice value from a DefElem. This is like
- * defGetBoolean() but also accepts the special value "match".
+ * Extract the CopyFormatOptions.header_line value from a DefElem.
+ *
+ * Parses the HEADER option for COPY, which can be a boolean, a non-negative
+ * integer (number of lines to skip), or the special value "match".
*/
-static CopyHeaderChoice
-defGetCopyHeaderChoice(DefElem *def, bool is_from)
+static int
+defGetCopyHeaderOption(DefElem *def, bool is_from)
{
/*
* If no parameter value given, assume "true" is meant.
@@ -335,20 +337,27 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
return COPY_HEADER_TRUE;
/*
- * Allow 0, 1, "true", "false", "on", "off", or "match".
+ * Allow 0, 1, "true", "false", "on", "off", a non-negative integer, or
+ * "match".
*/
switch (nodeTag(def->arg))
{
case T_Integer:
- switch (intVal(def->arg))
{
- case 0:
- return COPY_HEADER_FALSE;
- case 1:
- return COPY_HEADER_TRUE;
- default:
- /* otherwise, error out below */
- break;
+ int ival = intVal(def->arg);
+
+ if (ival < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("a negative integer value cannot be "
+ "specified for %s", def->defname)));
+
+ if (!is_from && ival > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use multi-line header in COPY TO")));
+
+ return ival;
}
break;
default:
@@ -381,7 +390,8 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
}
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("%s requires a Boolean value or \"match\"",
+ errmsg("%s requires a Boolean value, a non-negative integer, "
+ "or the string \"match\"",
def->defname)));
return COPY_HEADER_FALSE; /* keep compiler quiet */
}
@@ -566,7 +576,7 @@ ProcessCopyOptions(ParseState *pstate,
if (header_specified)
errorConflictingDefElem(defel, pstate);
header_specified = true;
- opts_out->header_line = defGetCopyHeaderChoice(defel, is_from);
+ opts_out->header_line = defGetCopyHeaderOption(defel, is_from);
}
else if (strcmp(defel->defname, "quote") == 0)
{
@@ -769,7 +779,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f52f2477df1..b1ae97b833d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -771,21 +771,30 @@ static pg_attribute_always_inline bool
NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
{
int fldct;
- bool done;
+ bool done = false;
/* only available for text or csv input */
Assert(!cstate->opts.binary);
/* on input check that the header line is correct if needed */
- if (cstate->cur_lineno == 0 && cstate->opts.header_line)
+ if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
{
ListCell *cur;
TupleDesc tupDesc;
+ int lines_to_skip = cstate->opts.header_line;
+
+ /* If set to "match", one header line is skipped */
+ if (cstate->opts.header_line == COPY_HEADER_MATCH)
+ lines_to_skip = 1;
tupDesc = RelationGetDescr(cstate->rel);
- cstate->cur_lineno++;
- done = CopyReadLine(cstate, is_csv);
+ for (int i = 0; i < lines_to_skip; i++)
+ {
+ cstate->cur_lineno++;
+ if ((done = CopyReadLine(cstate, is_csv)))
+ break;
+ }
if (cstate->opts.header_line == COPY_HEADER_MATCH)
{
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..67b94b91cae 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -199,7 +199,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
cstate->file_encoding);
/* if a header has been requested send the line */
- if (cstate->opts.header_line)
+ if (cstate->opts.header_line == COPY_HEADER_TRUE)
{
ListCell *cur;
bool hdr_delim = false;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..541176e1980 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,15 +20,12 @@
#include "tcop/dest.h"
/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
+ * Represents whether a header line must match the actual names
+ * (which implies "true"), and whether it should be present.
*/
-typedef enum CopyHeaderChoice
-{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
+#define COPY_HEADER_MATCH -1
+#define COPY_HEADER_FALSE 0
+#define COPY_HEADER_TRUE 1
/*
* Represents where to save input processing errors. More values to be added
@@ -64,7 +61,8 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
- CopyHeaderChoice header_line; /* header line? */
+ int header_line; /* number of lines to skip or COPY_HEADER_XXX
+ * value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
char *null_print_client; /* same converted to file encoding */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 8d5a06563c4..ac66eb55aee 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -81,6 +81,29 @@ copy copytest4 to stdout (header);
c1 colname with tab: \t
1 a
2 b
+-- test multi-line header line feature
+create temp table copytest5 (c1 int);
+copy copytest5 from stdin (format csv, header 2);
+copy copytest5 to stdout (header);
+c1
+1
+2
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+select count(*) from copytest5;
+ count
+-------
+ 0
+(1 row)
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
@@ -224,7 +247,7 @@ alter table header_copytest add column c text;
copy header_copytest to stdout with (header match);
ERROR: cannot use "match" with HEADER in COPY TO
copy header_copytest from stdin with (header wrong_choice);
-ERROR: header requires a Boolean value or "match"
+ERROR: header requires a Boolean value, a non-negative integer, or the string "match"
-- works
copy header_copytest from stdin with (header match);
copy header_copytest (c, a, b) from stdin with (header match);
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..caa3c44f0d0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -132,6 +132,12 @@ COPY x from stdin with (reject_limit 1);
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
+COPY x from stdin with (header -1);
+ERROR: a negative integer value cannot be specified for header
+COPY x from stdin with (header 2.5);
+ERROR: header requires a Boolean value, a non-negative integer, or the string "match"
+COPY x to stdout with (header 2);
+ERROR: cannot use multi-line header in COPY TO
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f0b88a23db8..a1316c73bac 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -94,6 +94,36 @@ this is just a line full of junk that would error out if parsed
copy copytest4 to stdout (header);
+-- test multi-line header line feature
+
+create temp table copytest5 (c1 int);
+
+copy copytest5 from stdin (format csv, header 2);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+copy copytest5 to stdout (header);
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 4);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
+truncate copytest5;
+copy copytest5 from stdin (format csv, header 5);
+this is a first header line.
+this is a second header line.
+1
+2
+\.
+select count(*) from copytest5;
+
-- test copy from with a partitioned table
create table parted_copytest (
a int,
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..cef45868db5 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -90,6 +90,9 @@ COPY x to stdout (format BINARY, on_error unsupported);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
+COPY x from stdin with (header -1);
+COPY x from stdin with (header 2.5);
+COPY x to stdout with (header 2);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 66c5782688a..e7d1c48e1f2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -521,7 +521,6 @@ CopyFormatOptions
CopyFromRoutine
CopyFromState
CopyFromStateData
-CopyHeaderChoice
CopyInsertMethod
CopyLogVerbosityChoice
CopyMethod
--
2.39.3
On 2025/07/03 11:08, Shinya Kato wrote:
On Wed, Jul 2, 2025 at 4:48 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Regarding the documentation, how about explicitly stating that when MATCH is specified, only
the first line is skipped? While this may seem obvious, it’s worth clarifying, as the semantics
of the HEADER option have become a bit more complex with this change.Agreed. I have updated the documentation as follows:
+ lines are discarded. If the option is set to <literal>MATCH</literal>, + the number and names of the columns in the header line must exactly + match those of the table and, in order, after which the header line is + discarded; otherwise an error is raised. The <literal>MATCH</literal>How about making the wording a bit clearer? For example:
If set to MATCH, the first line is discarded, and it must contain column names that
exactly match the table's columns, in both number and order; otherwise, an error is raised.Thank you for the review. I fixed it.
Thanks for updating the patch! I've pushed the patch.
Also, the phrase "if this option is set to..." is repeated three times in the current text.
For the second and third instances, we could simplify it to just "if set to...".Agreed. However, for the sake of symmetry between "On output" and "On
input" and to maintain clarity between the paragraphs, I have omitted
"this option is" from the "On input" paragraph only.
Yes, I agree that's better.
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On Thu, Jul 3, 2025 at 3:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2025/07/03 11:08, Shinya Kato wrote:
On Wed, Jul 2, 2025 at 4:48 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Regarding the documentation, how about explicitly stating that when MATCH is specified, only
the first line is skipped? While this may seem obvious, it’s worth clarifying, as the semantics
of the HEADER option have become a bit more complex with this change.Agreed. I have updated the documentation as follows:
+ lines are discarded. If the option is set to <literal>MATCH</literal>, + the number and names of the columns in the header line must exactly + match those of the table and, in order, after which the header line is + discarded; otherwise an error is raised. The <literal>MATCH</literal>How about making the wording a bit clearer? For example:
If set to MATCH, the first line is discarded, and it must contain column names that
exactly match the table's columns, in both number and order; otherwise, an error is raised.Thank you for the review. I fixed it.
Thanks for updating the patch! I've pushed the patch.
Also, the phrase "if this option is set to..." is repeated three times in the current text.
For the second and third instances, we could simplify it to just "if set to...".Agreed. However, for the sake of symmetry between "On output" and "On
input" and to maintain clarity between the paragraphs, I have omitted
"this option is" from the "On input" paragraph only.Yes, I agree that's better.
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Thank you for pushing!
--
Best regards,
Shinya Kato
NTT OSS Center