New "raw" COPY format
Hi hackers,
This thread is about implementing a new "raw" COPY format.
This idea came up in a different thread [1]/messages/by-id/47b5c6a7-5c0e-40aa-8ea2-c7b95ccf296f@app.fastmail.com, moved here.
[1]: /messages/by-id/47b5c6a7-5c0e-40aa-8ea2-c7b95ccf296f@app.fastmail.com
The main use-case for the raw format, is when needing to import arbitrary
unstructured text files, such as log files, into a single text column
of a table.
The name "raw" is just a working title. Andrew had some other good name ideas:
WFM, so something like FORMAT {SIMPLE, RAW, FAST, SINGLE}?
Below is the draft of its description, sent previously [1]/messages/by-id/47b5c6a7-5c0e-40aa-8ea2-c7b95ccf296f@app.fastmail.com,
adjusted thanks to feedback from Daniel Verite, who made me realize the
HEADER option should be made available also for this format.
--- START OF DESCRIPTION ---
Raw Format
The "raw" format is used for importing and exporting files containing
unstructured text, where each line is treated as a single field. This format
is ideal when dealing with data that doesn't conform to a structured,
tabular format and lacks delimiters.
Key Characteristics:
- No Field Delimiters:
Each line is considered a complete value without any field separation.
- Single Column Requirement:
The COPY command must specify exactly one column when using the raw format.
Specifying multiple columns will result in an error.
- Literal Data Interpretation:
All characters are taken literally.
There is no special handling for quotes, backslashes, or escape sequences.
- No NULL Distinction:
Empty lines are imported as empty strings, not as NULL values.
Notes:
- Error Handling:
An error will occur if you use the raw format without specifying exactly
one column or if the table has multiple columns and no column list is
provided.
- Data Preservation:
All characters, including whitespace and special characters, are preserved
exactly as they appear in the file.
--- END OF DESCRIPTION ---
After having studied the code that will be affected,
I feel that before making any changes, I would like to try to improve
ProcessCopyOptions, in terms of readability and maintainability, first.
This seems possible by just reorganize it a bit.
It is actually already organized quite nicely, where the code is mostly
organized per-option, but not always, as it sometimes is spread across
different sections.
It seems possible to organize even more of it per-option,
which would make it easier to reason about each option separately.
This seems possible by organizing the checks per option,
under a single if-branch per option, and moving the setting
of defaults per option (when applicable) to the corresponding
else-branch.
This would also avoid setting defaults for options that are not applicable
for a given format, and instead let their initial NULL value remain untouched,
rather than setting unnecessary defaults.
Some of the checks depend on multiple options in an interdependent way,
not belonging to a specific option more than another. I think such checks
would be nice to place at the end under a separate section.
I also think it would be more readable to use the existing bool variables
named [option]_specified, to determine if an option has been set,
rather than relying on the option's default enum value to evaluate to false.
The attached patch implements the above ideas.
I think with these changes, it would be easier to hack on new and existing
copy options and formats.
/Joel
Attachments:
v1-0001-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v1-0001-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From d621bb2fd0d0d6079ec16a92f5c925fd9fa0baaa Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Thu, 10 Oct 2024 08:33:33 +0200
Subject: [PATCH 1/2] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 44 ++++++++++++++--------------
src/backend/commands/copyfrom.c | 10 +++----
src/backend/commands/copyfromparse.c | 34 ++++++++++-----------
src/backend/commands/copyto.c | 20 ++++++-------
src/include/commands/copy.h | 13 ++++++--
5 files changed, 65 insertions(+), 56 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 03eb7a4eba..2021300308 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -493,11 +493,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -650,36 +650,36 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these two before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -727,7 +727,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -735,43 +735,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -785,7 +785,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_notnull != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -799,7 +799,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_null != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -822,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -858,7 +858,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 9139a40785..46a662465a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1576,7 +1576,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1627,7 +1627,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1768,14 +1768,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6f64d97fdd..4b4079db95 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
--
2.45.1
v1-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patchapplication/octet-stream; name="=?UTF-8?Q?v1-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.p?= =?UTF-8?Q?atch?="Download
From d5c1a45ee48bfc0f14ea992589809b0da144d755 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Fri, 11 Oct 2024 21:26:22 +0200
Subject: [PATCH 2/2] Reorganize ProcessCopyOptions for clarity and consistent
option handling.
No changes to the function's signature or behavior; the refactoring solely
improves code structure and readability.
Changes:
* Refactored ProcessCopyOptions to improve readability and maintainability
by grouping per-option checks and default assignments into dedicated sections.
This enhances the logical flow and makes it easier to understand how each COPY
option is processed.
* Explicitly set the default format to COPY_FORMAT_TEXT when the FORMAT option
is not specified. Previously, the default was implied due to
zero-initialization, but making it explicit clarifies the default behavior.
* Consistently use boolean specified-variables to determine if an option has
been provided, rather than relying on default values from zero-initialization.
* Added assertions to ensure necessary options are set before performing
dependent checks, explicitly indicating that they have been assigned either
specified or default values.
* Relocated interdependent option validations to a dedicated section for
additional clarity.
---
src/backend/commands/copy.c | 433 ++++++++++++++++++++++--------------
1 file changed, 271 insertions(+), 162 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2021300308..b4f6d3ee93 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -647,200 +647,260 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these two before inserting
- * defaults)
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ /*
+ * Begin per-option checks and set defaults where necessary
+ */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ /* --- FORMAT option is always allowed; no additional checks needed --- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ /* --- FREEZE option --- */
+ if (freeze_specified)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (opts_out->format == COPY_FORMAT_CSV)
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
+ }
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- if (opts_out->default_print)
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
{
- opts_out->default_print_len = strlen(opts_out->default_print);
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (strchr(opts_out->default_print, '\r') != NULL ||
- strchr(opts_out->default_print, '\n') != NULL)
+ if (strlen(opts_out->quote) != 1)
ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY default representation cannot use newline or carriage return")));
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
}
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_notnull != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
+ /* --- ON_ERROR option --- */
+ if (on_error_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_null != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+ }
+ else
+ {
+ /* Default is COPY_ON_ERROR_STOP */
+ opts_out->on_error = COPY_ON_ERROR_STOP;
+ }
- if (opts_out->force_null != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
+ /* --- DEFAULT option --- */
+ if (opts_out->default_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ opts_out->default_print_len = strlen(opts_out->default_print);
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
+ if (strchr(opts_out->default_print, '\r') != NULL ||
+ strchr(opts_out->default_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY default representation cannot use newline or carriage return")));
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -874,6 +934,55 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote character in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
On Fri, Oct 11, 2024, at 22:29, Joel Jacobson wrote:
Hi hackers,
This thread is about implementing a new "raw" COPY format.
...
The attached patch implements the above ideas.
I think with these changes, it would be easier to hack on new and existing
copy options and formats./Joel
Attachments:
* v1-0001-Replace-binary-flags-binary-and-csv_mode-with-format.patch
* v1-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch
Ops, I see I failed to use the correct way to check if
opts_out->force_notnull or opts_out->force_null
have been set, that is using != NIL.
However, thanks to not just blindly copy/pasting this code,
I see I actually fixed a bug in HEAD, by also checking
opts_out->force_notnull_all or opts_out->force_null_all,
which HEAD currently fails to do:
joel=# copy t to '/tmp/t.csv' (format text, FORCE_NOT_NULL (c1));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
joel=# copy t to '/tmp/t.csv' (format text, FORCE_NOT_NULL *);
COPY 0
joel=# copy t to '/tmp/t.csv' (format text, FORCE_NULL (c1));
ERROR: COPY FORCE_NULL requires CSV mode
joel=# copy t to '/tmp/t.csv' (format text, FORCE_NULL *);
COPY 0
Fixed in new version:
joel=# copy t to '/tmp/t.csv' (format text, FORCE_NOT_NULL *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
joel=# copy t to '/tmp/t.csv' (format text, FORCE_NULL *);
ERROR: COPY FORCE_NULL requires CSV mode
/Joel
Attachments:
v2-0001-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v2-0001-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From d621bb2fd0d0d6079ec16a92f5c925fd9fa0baaa Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Thu, 10 Oct 2024 08:33:33 +0200
Subject: [PATCH 1/2] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 44 ++++++++++++++--------------
src/backend/commands/copyfrom.c | 10 +++----
src/backend/commands/copyfromparse.c | 34 ++++++++++-----------
src/backend/commands/copyto.c | 20 ++++++-------
src/include/commands/copy.h | 13 ++++++--
5 files changed, 65 insertions(+), 56 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 03eb7a4eba..2021300308 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -493,11 +493,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -650,36 +650,36 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these two before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -727,7 +727,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -735,43 +735,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -785,7 +785,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_notnull != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -799,7 +799,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_null != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -822,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -858,7 +858,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 9139a40785..46a662465a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1576,7 +1576,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1627,7 +1627,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1768,14 +1768,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6f64d97fdd..4b4079db95 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
--
2.45.1
v2-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patchapplication/octet-stream; name="=?UTF-8?Q?v2-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.p?= =?UTF-8?Q?atch?="Download
From 5eb8d2e965ddcd4b4e0348c5295c5d5dadbe9c56 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Fri, 11 Oct 2024 21:26:22 +0200
Subject: [PATCH 2/2] Reorganize ProcessCopyOptions for clarity and consistent
option handling.
No changes to the function's signature or behavior; the refactoring solely
improves code structure and readability.
Changes:
* Refactored ProcessCopyOptions to improve readability and maintainability
by grouping per-option checks and default assignments into dedicated sections.
This enhances the logical flow and makes it easier to understand how each COPY
option is processed.
* Explicitly set the default format to COPY_FORMAT_TEXT when the FORMAT option
is not specified. Previously, the default was implied due to
zero-initialization, but making it explicit clarifies the default behavior.
* Consistently use boolean specified-variables to determine if an option has
been provided, rather than relying on default values from zero-initialization.
* Added assertions to ensure necessary options are set before performing
dependent checks, explicitly indicating that they have been assigned either
specified or default values.
* Relocated interdependent option validations to a dedicated section for
additional clarity.
---
src/backend/commands/copy.c | 433 ++++++++++++++++++++++--------------
1 file changed, 271 insertions(+), 162 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2021300308..856b878a91 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -647,200 +647,260 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these two before inserting
- * defaults)
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ /*
+ * Begin per-option checks and set defaults where necessary
+ */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ /* --- FORMAT option is always allowed; no additional checks needed --- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ /* --- FREEZE option --- */
+ if (freeze_specified)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (opts_out->format == COPY_FORMAT_CSV)
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
+ }
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- if (opts_out->default_print)
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
{
- opts_out->default_print_len = strlen(opts_out->default_print);
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (strchr(opts_out->default_print, '\r') != NULL ||
- strchr(opts_out->default_print, '\n') != NULL)
+ if (strlen(opts_out->quote) != 1)
ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY default representation cannot use newline or carriage return")));
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
}
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_notnull != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
+ /* --- ON_ERROR option --- */
+ if (on_error_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_null != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+ }
+ else
+ {
+ /* Default is COPY_ON_ERROR_STOP */
+ opts_out->on_error = COPY_ON_ERROR_STOP;
+ }
- if (opts_out->force_null != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
+ /* --- DEFAULT option --- */
+ if (opts_out->default_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ opts_out->default_print_len = strlen(opts_out->default_print);
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
+ if (strchr(opts_out->default_print, '\r') != NULL ||
+ strchr(opts_out->default_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY default representation cannot use newline or carriage return")));
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -874,6 +934,55 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote character in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
On Sat, Oct 12, 2024 at 5:02 AM Joel Jacobson <joel@compiler.org> wrote:
On Fri, Oct 11, 2024, at 22:29, Joel Jacobson wrote:
Hi hackers,
This thread is about implementing a new "raw" COPY format.
...
The attached patch implements the above ideas.
I think with these changes, it would be easier to hack on new and existing
copy options and formats./Joel
Attachments:
* v1-0001-Replace-binary-flags-binary-and-csv_mode-with-format.patch
* v1-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patchOps, I see I failed to use the correct way to check if
opts_out->force_notnull or opts_out->force_null
have been set, that is using != NIL.However, thanks to not just blindly copy/pasting this code,
I see I actually fixed a bug in HEAD, by also checking
opts_out->force_notnull_all or opts_out->force_null_all,
which HEAD currently fails to do:
git version 2.34.1
cannot do `git apply`
trying:
patch -p1 < patch -p1 <
$PATCHES/v2-0001-Replace-binary-flags-binary-and-csv_mode-with-format.patch
patch -p1 < $PATCHES/v2-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch
After that, I still cannot apply.
typedef enum CopyFormat
{
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV
} CopyFormat;
the last element should add a comma.
CopyFormat should add to
src/tools/pgindent/typedefs.list
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the NULL or DEFAULT
strings */
+
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
these Asserts, no need? Without it, if conditions are not met, it will
still segfault.
there is no sql example, like
copy the_table from :'filename' (format raw);
in the patch.
I thought you were going to implement something like that.
On Sat, Oct 12, 2024, at 02:48, jian he wrote:
git version 2.34.1
cannot do `git apply`
Sorry about that, fixed.
typedef enum CopyFormat
{
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV
} CopyFormat;
Thanks, fixed.
CopyFormat should add to
src/tools/pgindent/typedefs.list
Thanks, fixed.
+ /* Assert options have been set (defaults applied if not specified) */ + Assert(opts_out->delim); + Assert(opts_out->null_print); + + /* Don't allow the delimiter to appear in the NULL or DEFAULT strings */ + + if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)+ Assert(opts_out->delim); + Assert(opts_out->quote); + Assert(opts_out->null_print); + + if (opts_out->delim[0] == opts_out->quote[0]) these Asserts, no need? Without it, if conditions are not met, it will still segfault.
The asserts are only there to indicate that at this point in the code,
we can be certain the delim, quote and null_print have been set,
since the format is COPY_FORMAT_CSV, otherwise it would be a bug.
If you don't think they add any documentation value, I'm OK with removing them.
there is no sql example, like
copy the_table from :'filename' (format raw);in the patch.
I thought you were going to implement something like that.
Sorry if that was unclear, yes, that's the plan, but as I wrote:
After having studied the code that will be affected,
I feel that before making any changes, I would like to try to improve
ProcessCopyOptions, in terms of readability and maintainability, first.
So, I just wanted to get some feedback first, if this reorganization of ProcessCopyOptions,
would be OK to do first, which I think is needed for it to be easily maintainable.
/Joel
Attachments:
v3-0001-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v3-0001-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From ffd73a812052c6de67713fb23de2d8f1bd614ac8 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 1/2] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 45 ++++++++++++++--------------
src/backend/commands/copyfrom.c | 10 +++----
src/backend/commands/copyfromparse.c | 34 ++++++++++-----------
src/backend/commands/copyto.c | 20 ++++++-------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 67 insertions(+), 56 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..68340e534a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,7 +805,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_notnull != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -819,7 +819,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_null != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -842,7 +842,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -878,7 +878,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -895,7 +895,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..e700fd01b5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a65e1c07c5..87a4d1ce2d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v3-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patchapplication/octet-stream; name="=?UTF-8?Q?v3-0002-Reorganize-ProcessCopyOptions-for-clarity-and-consis.p?= =?UTF-8?Q?atch?="Download
From 83e2b9c24db94a251990b10d345949487edba194 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:29:51 +0200
Subject: [PATCH 2/2] Reorganize ProcessCopyOptions for clarity and consistent
option handling.
No changes to the function's signature or behavior; the refactoring solely
improves code structure and readability.
Changes:
* Refactored ProcessCopyOptions to improve readability and maintainability
by grouping per-option checks and default assignments into dedicated sections.
This enhances the logical flow and makes it easier to understand how each COPY
option is processed.
* Explicitly set the default format to COPY_FORMAT_TEXT when the FORMAT option
is not specified. Previously, the default was implied due to
zero-initialization, but making it explicit clarifies the default behavior.
* Consistently use boolean specified-variables to determine if an option has
been provided, rather than relying on default values from zero-initialization.
* Added assertions to ensure necessary options are set before performing
dependent checks, explicitly indicating that they have been assigned either
specified or default values.
* Relocated interdependent option validations to a dedicated section for
additional clarity.
---
src/backend/commands/copy.c | 452 ++++++++++++++++++++++--------------
1 file changed, 282 insertions(+), 170 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 68340e534a..493ca5f487 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -672,195 +672,272 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ /*
+ * Begin per-option checks and set defaults where necessary
+ */
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- FORMAT option is always allowed; no additional checks needed --- */
- if (opts_out->format == COPY_FORMAT_CSV)
+ /* --- FREEZE option --- */
+ if (freeze_specified)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+ else
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Default is false; no action needed */
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
- if (opts_out->default_print)
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- opts_out->default_print_len = strlen(opts_out->default_print);
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
- if (strchr(opts_out->default_print, '\r') != NULL ||
- strchr(opts_out->default_print, '\n') != NULL)
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY default representation cannot use newline or carriage return")));
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
}
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_notnull != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->force_null != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* --- ON_ERROR option --- */
+ if (on_error_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ }
+ else
+ {
+ /* Default is COPY_ON_ERROR_STOP */
+ opts_out->on_error = COPY_ON_ERROR_STOP;
+ }
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
+ /* --- REJECT_LIMIT option --- */
+ if (reject_limit_specified)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ opts_out->default_print_len = strlen(opts_out->default_print);
+
+ if (strchr(opts_out->default_print, '\r') != NULL ||
+ strchr(opts_out->default_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY default representation cannot use newline or carriage return")));
+
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -894,20 +971,55 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote character in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
Hi hackers,
This thread is about implementing a new "raw" COPY format.
This idea came up in a different thread [1], moved here.
[1] /messages/by-id/47b5c6a7-5c0e-40aa-8ea2-c7b95ccf296f@app.fastmail.com
The main use-case for the raw format, is when needing to import arbitrary
unstructured text files, such as log files, into a single text column
of a table.
After copy imported the "unstructured text file" in "row" COPY format,
what the column type is? text? or bytea? If it's text, how do you
handle encoding conversion if the "unstructured text file" is encoded
in server side unsafe encoding such as SJIS?
All characters are taken literally.
There is no special handling for quotes, backslashes, or escape sequences.
If SJIS text is imported "literally" (i.e. no encoding conversion), it
should be rejected.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Sun, Oct 13, 2024, at 11:52, Tatsuo Ishii wrote:
After copy imported the "unstructured text file" in "row" COPY format,
what the column type is? text? or bytea? If it's text, how do you
handle encoding conversion if the "unstructured text file" is encoded
in server side unsafe encoding such as SJIS?All characters are taken literally.
There is no special handling for quotes, backslashes, or escape sequences.If SJIS text is imported "literally" (i.e. no encoding conversion), it
should be rejected.
I think encoding conversion is still necessary,
and should work the same as for the COPY formats "text" and "csv".
/Joel
On Sun, Oct 13, 2024, at 14:39, Joel Jacobson wrote:
On Sun, Oct 13, 2024, at 11:52, Tatsuo Ishii wrote:
After copy imported the "unstructured text file" in "row" COPY format,
what the column type is? text? or bytea? If it's text, how do you
handle encoding conversion if the "unstructured text file" is encoded
in server side unsafe encoding such as SJIS?All characters are taken literally.
There is no special handling for quotes, backslashes, or escape sequences.If SJIS text is imported "literally" (i.e. no encoding conversion), it
should be rejected.I think encoding conversion is still necessary,
and should work the same as for the COPY formats "text" and "csv".
Attached is a first draft implementation of the new proposed COPY "raw" format.
The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/
* v4-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patch
The first patch fixes a thinko in tests for COPY options force_not_null and force_null.
* v4-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patch
The second patch fixes validation of FORCE_NOT_NULL/FORCE_NULL for all-columns case.
* v4-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patch
The third patch introduces a new enum CopyFormat, with options for the three current formats.
* v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch
The fourth patch reorganize ProcessCopyOptions for clarity and consistent option handling.
* v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patch
Finally, the firth patch introduces the new "raw" COPY format.
Docs and tests updated.
The raw format currently goes through the same multiple stages,
as the text and CSV formats. I'm not sure what the best approach would be,
if we would want to create a special fast parsing path for this.
/Joel
Attachments:
v4-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v4-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnul?= =?UTF-8?Q?l-.patch?="Download
From b42a0f03d6fa942f9e785181589056a9b1897829 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:23:55 +0200
Subject: [PATCH 1/5] Fix thinko in tests for COPY options force_not_null and
force_null.
Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.
In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.
---
src/test/regress/expected/copy2.out | 20 ++++++++++----------
src/test/regress/sql/copy2.sql | 16 ++++++++--------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index ab449fa7b8..3f420db0bc 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -86,9 +86,9 @@ ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
@@ -96,22 +96,22 @@ COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
-COPY x to stdout (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
-COPY x to stdin (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
-COPY x to stdout (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
-COPY x to stdin (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
- ^
+LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 1aa0e41b68..5790057e1c 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,17 +70,17 @@ COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
-COPY x to stdout (format TEXT, force_not_null(a));
-COPY x to stdin (format CSV, force_not_null(a));
-COPY x to stdout (format TEXT, force_null(a));
-COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x from stdin (format TEXT, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
+COPY x from stdin (format TEXT, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
--
2.45.1
v4-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v4-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-?= =?UTF-8?Q?all-.patch?="Download
From 2778636e121380ee690446d2bcbfc34f08adc952 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:35:28 +0200
Subject: [PATCH 2/5] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.
Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.
---
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/expected/copy2.out | 12 ++++++++++++
src/test/regress/sql/copy2.sql | 6 ++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..e93ea3d627 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -805,12 +805,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
+ if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
+ !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
@@ -819,13 +821,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
+ if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 3f420db0bc..626a437d40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -98,16 +98,28 @@ LINE 1: COPY x from stdin (on_error unsupported);
^
COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
+COPY x from stdin (format CSV, force_quote *);
+ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_null *);
+ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 5790057e1c..3458d287f2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -75,11 +75,17 @@ COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
+COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
--
2.45.1
v4-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v4-0003-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From 04e662b93053e613afeaadff535a47d848d20329 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 3/5] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e93ea3d627..effe337229 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -845,7 +846,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -881,7 +882,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,7 +899,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..e700fd01b5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a65e1c07c5..87a4d1ce2d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patchapplication/octet-stream; name="=?UTF-8?Q?v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.p?= =?UTF-8?Q?atch?="Download
From 5a19ec27879d5bf2e644846a315bcfc67a0a26da Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:29:51 +0200
Subject: [PATCH 4/5] Reorganize ProcessCopyOptions for clarity and consistent
option handling.
No changes to the function's signature or behavior; the refactoring solely
improves code structure and readability.
Changes:
* Refactored ProcessCopyOptions to improve readability and maintainability
by grouping per-option checks and default assignments into dedicated sections.
This enhances the logical flow and makes it easier to understand how each COPY
option is processed.
* Explicitly set the default format to COPY_FORMAT_TEXT when the FORMAT option
is not specified. Previously, the default was implied due to
zero-initialization, but making it explicit clarifies the default behavior.
* Consistently use boolean specified-variables to determine if an option has
been provided, rather than relying on default values from zero-initialization.
* Added assertions to ensure necessary options are set before performing
dependent checks, explicitly indicating that they have been assigned either
specified or default values.
* Relocated interdependent option validations to a dedicated section for
additional clarity.
---
src/backend/commands/copy.c | 456 ++++++++++++++++++++++--------------
1 file changed, 282 insertions(+), 174 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index effe337229..493ca5f487 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -672,199 +672,272 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ /*
+ * Begin per-option checks and set defaults where necessary
+ */
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- FORMAT option is always allowed; no additional checks needed --- */
- if (opts_out->format == COPY_FORMAT_CSV)
+ /* --- FREEZE option --- */
+ if (freeze_specified)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+ else
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Default is false; no action needed */
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
- if (opts_out->default_print)
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- opts_out->default_print_len = strlen(opts_out->default_print);
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
- if (strchr(opts_out->default_print, '\r') != NULL ||
- strchr(opts_out->default_print, '\n') != NULL)
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY default representation cannot use newline or carriage return")));
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
}
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* --- ON_ERROR option --- */
+ if (on_error_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ }
+ else
+ {
+ /* Default is COPY_ON_ERROR_STOP */
+ opts_out->on_error = COPY_ON_ERROR_STOP;
+ }
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
+ /* --- REJECT_LIMIT option --- */
+ if (reject_limit_specified)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ opts_out->default_print_len = strlen(opts_out->default_print);
+
+ if (strchr(opts_out->default_print, '\r') != NULL ||
+ strchr(opts_out->default_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY default representation cannot use newline or carriage return")));
+
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,20 +971,55 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote character in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patchapplication/octet-stream; name="=?UTF-8?Q?v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.p?= =?UTF-8?Q?atch?="Download
From ef7976e2e618e2f580d6354fbdf9cc8a5b802258 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sun, 13 Oct 2024 21:01:53 +0200
Subject: [PATCH 5/5] Add "raw" COPY format support for unstructured text data.
This commit introduces a new format option to the COPY command, enabling
the import and export of unstructured text data where each line is treated as a
single field without any delimiters.
---
doc/src/sgml/ref/copy.sgml | 57 +++++++-
src/backend/commands/copy.c | 39 +++--
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 204 ++++++++++++++++++++++++++-
src/backend/commands/copyto.c | 70 ++++++++-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 108 ++++++++++++++
src/test/regress/expected/copy2.out | 62 ++++++++
src/test/regress/sql/copy.sql | 67 +++++++++
src/test/regress/sql/copy2.sql | 56 ++++++++
10 files changed, 645 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index f493ddb371..e6f7a26016 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
</para>
</listitem>
@@ -256,7 +257,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -270,7 +272,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -293,7 +296,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -399,7 +402,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -892,6 +896,47 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2>
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 493ca5f487..b71161ad99 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -710,6 +712,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "DELIMITER")));
+
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
ereport(ERROR,
@@ -740,11 +748,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
- }
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -754,6 +762,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -761,11 +774,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -925,6 +939,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->quote);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..2528c6f111 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +832,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1112,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1462,6 +1481,138 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+
+ /*
+ * The objective of this loop is to transfer the entire next input line
+ * into line_buf. We only care for detecting newlines (\r and/or \n).
+ * All other characters are treated as regular data.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * For a little extra speed within the loop, we copy input_buf and
+ * input_buf_len into local variables.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+ char c;
+
+ /*
+ * Load more data if needed.
+ */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* OK to fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+ c = copy_input_buf[input_buf_ptr++];
+
+ /* Process \r */
+ if (c == '\r')
+ {
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (cstate->eol_type == EOL_UNKNOWN ||
+ cstate->eol_type == EOL_CRNL)
+ {
+ /*
+ * If need more data, go back to loop top to load it.
+ *
+ * Note that if we are at EOF, c will wind up as '\0' because
+ * of the guaranteed pad of input_buf.
+ */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+ /* get next char */
+ c = copy_input_buf[input_buf_ptr];
+
+ if (c == '\n')
+ {
+ input_buf_ptr++; /* eat newline */
+ cstate->eol_type = EOL_CRNL; /* in case not set yet */
+ }
+ else
+ {
+ if (cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /*
+ * if we got here, it is the first line and we didn't find
+ * \n, so don't consume the peeked character
+ */
+ cstate->eol_type = EOL_CR;
+ }
+ }
+ else if (cstate->eol_type == EOL_NL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* Process \n */
+ if (c == '\n')
+ {
+ if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ cstate->eol_type = EOL_NL; /* in case not set yet */
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* All other characters are treated as regular data */
+ } /* end of outer loop */
+
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2089,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' requires exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..99fd68a483 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -570,6 +571,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +843,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +927,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +956,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +976,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1261,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e700fd01b5..04f7548ef4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..120f7c3b6d 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,111 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 626a437d40..617ee4dad0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -929,3 +929,65 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..7ec41fa16e 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,70 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 3458d287f2..018764102d 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -707,3 +707,59 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
--
2.45.1
On Mon, Oct 14, 2024, at 10:07, Joel Jacobson wrote:
Attached is a first draft implementation of the new proposed COPY "raw" format.
The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/
I forgot about adding support for the old syntax format.
Fixed in new version. Only the fifth patch is updated.
Before, only the new syntax worked:
COPY .... (FORMAT raw);
Now this also works:
COPY ... RAW;
/Joel
Attachments:
v5-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v5-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnul?= =?UTF-8?Q?l-.patch?="Download
From b42a0f03d6fa942f9e785181589056a9b1897829 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:23:55 +0200
Subject: [PATCH 1/5] Fix thinko in tests for COPY options force_not_null and
force_null.
Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.
In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.
---
src/test/regress/expected/copy2.out | 20 ++++++++++----------
src/test/regress/sql/copy2.sql | 16 ++++++++--------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index ab449fa7b8..3f420db0bc 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -86,9 +86,9 @@ ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
@@ -96,22 +96,22 @@ COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
-COPY x to stdout (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
-COPY x to stdin (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
-COPY x to stdout (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
-COPY x to stdin (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
- ^
+LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 1aa0e41b68..5790057e1c 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,17 +70,17 @@ COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
-COPY x to stdout (format TEXT, force_not_null(a));
-COPY x to stdin (format CSV, force_not_null(a));
-COPY x to stdout (format TEXT, force_null(a));
-COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x from stdin (format TEXT, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
+COPY x from stdin (format TEXT, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
--
2.45.1
v5-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v5-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-?= =?UTF-8?Q?all-.patch?="Download
From 2778636e121380ee690446d2bcbfc34f08adc952 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:35:28 +0200
Subject: [PATCH 2/5] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.
Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.
---
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/expected/copy2.out | 12 ++++++++++++
src/test/regress/sql/copy2.sql | 6 ++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..e93ea3d627 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -805,12 +805,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
+ if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
+ !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
@@ -819,13 +821,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
+ if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 3f420db0bc..626a437d40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -98,16 +98,28 @@ LINE 1: COPY x from stdin (on_error unsupported);
^
COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
+COPY x from stdin (format CSV, force_quote *);
+ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_null *);
+ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 5790057e1c..3458d287f2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -75,11 +75,17 @@ COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
+COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
--
2.45.1
v5-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v5-0003-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From 04e662b93053e613afeaadff535a47d848d20329 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 3/5] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e93ea3d627..effe337229 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -845,7 +846,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -881,7 +882,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,7 +899,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..e700fd01b5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a65e1c07c5..87a4d1ce2d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v5-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patchapplication/octet-stream; name="=?UTF-8?Q?v5-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.p?= =?UTF-8?Q?atch?="Download
From 5a19ec27879d5bf2e644846a315bcfc67a0a26da Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:29:51 +0200
Subject: [PATCH 4/5] Reorganize ProcessCopyOptions for clarity and consistent
option handling.
No changes to the function's signature or behavior; the refactoring solely
improves code structure and readability.
Changes:
* Refactored ProcessCopyOptions to improve readability and maintainability
by grouping per-option checks and default assignments into dedicated sections.
This enhances the logical flow and makes it easier to understand how each COPY
option is processed.
* Explicitly set the default format to COPY_FORMAT_TEXT when the FORMAT option
is not specified. Previously, the default was implied due to
zero-initialization, but making it explicit clarifies the default behavior.
* Consistently use boolean specified-variables to determine if an option has
been provided, rather than relying on default values from zero-initialization.
* Added assertions to ensure necessary options are set before performing
dependent checks, explicitly indicating that they have been assigned either
specified or default values.
* Relocated interdependent option validations to a dedicated section for
additional clarity.
---
src/backend/commands/copy.c | 456 ++++++++++++++++++++++--------------
1 file changed, 282 insertions(+), 174 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index effe337229..493ca5f487 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -672,199 +672,272 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ /*
+ * Begin per-option checks and set defaults where necessary
+ */
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- FORMAT option is always allowed; no additional checks needed --- */
- if (opts_out->format == COPY_FORMAT_CSV)
+ /* --- FREEZE option --- */
+ if (freeze_specified)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+ else
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Default is false; no action needed */
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
- if (opts_out->default_print)
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- opts_out->default_print_len = strlen(opts_out->default_print);
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
- if (strchr(opts_out->default_print, '\r') != NULL ||
- strchr(opts_out->default_print, '\n') != NULL)
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY default representation cannot use newline or carriage return")));
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
}
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* --- ON_ERROR option --- */
+ if (on_error_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ }
+ else
+ {
+ /* Default is COPY_ON_ERROR_STOP */
+ opts_out->on_error = COPY_ON_ERROR_STOP;
+ }
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
+ /* --- REJECT_LIMIT option --- */
+ if (reject_limit_specified)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ opts_out->default_print_len = strlen(opts_out->default_print);
+
+ if (strchr(opts_out->default_print, '\r') != NULL ||
+ strchr(opts_out->default_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY default representation cannot use newline or carriage return")));
+
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,20 +971,55 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote character in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
v5-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patchapplication/octet-stream; name="=?UTF-8?Q?v5-0005-Add-raw-COPY-format-support-for-unstructured-text-da.p?= =?UTF-8?Q?atch?="Download
From c11bbce90beab950fadaaf8f2ecd52daffd92db9 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sun, 13 Oct 2024 21:01:53 +0200
Subject: [PATCH 5/5] Add "raw" COPY format support for unstructured text data.
This commit introduces a new format option to the COPY command, enabling
the import and export of unstructured text data where each line is treated as a
single field without any delimiters.
---
doc/src/sgml/ref/copy.sgml | 57 +++++++-
src/backend/commands/copy.c | 39 +++--
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 204 ++++++++++++++++++++++++++-
src/backend/commands/copyto.c | 70 ++++++++-
src/backend/parser/gram.y | 8 +-
src/include/commands/copy.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy.out | 123 ++++++++++++++++
src/test/regress/expected/copy2.out | 82 ++++++++++-
src/test/regress/sql/copy.sql | 70 +++++++++
src/test/regress/sql/copy2.sql | 67 ++++++++-
12 files changed, 700 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index f493ddb371..e6f7a26016 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
</para>
</listitem>
@@ -256,7 +257,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -270,7 +272,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -293,7 +296,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -399,7 +402,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -892,6 +896,47 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2>
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 493ca5f487..b71161ad99 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -710,6 +712,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "DELIMITER")));
+
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
ereport(ERROR,
@@ -740,11 +748,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
- }
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -754,6 +762,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -761,11 +774,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -925,6 +939,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->quote);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..2528c6f111 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +832,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1112,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1462,6 +1481,138 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+
+ /*
+ * The objective of this loop is to transfer the entire next input line
+ * into line_buf. We only care for detecting newlines (\r and/or \n).
+ * All other characters are treated as regular data.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * For a little extra speed within the loop, we copy input_buf and
+ * input_buf_len into local variables.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+ char c;
+
+ /*
+ * Load more data if needed.
+ */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* OK to fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+ c = copy_input_buf[input_buf_ptr++];
+
+ /* Process \r */
+ if (c == '\r')
+ {
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (cstate->eol_type == EOL_UNKNOWN ||
+ cstate->eol_type == EOL_CRNL)
+ {
+ /*
+ * If need more data, go back to loop top to load it.
+ *
+ * Note that if we are at EOF, c will wind up as '\0' because
+ * of the guaranteed pad of input_buf.
+ */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+ /* get next char */
+ c = copy_input_buf[input_buf_ptr];
+
+ if (c == '\n')
+ {
+ input_buf_ptr++; /* eat newline */
+ cstate->eol_type = EOL_CRNL; /* in case not set yet */
+ }
+ else
+ {
+ if (cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /*
+ * if we got here, it is the first line and we didn't find
+ * \n, so don't consume the peeked character
+ */
+ cstate->eol_type = EOL_CR;
+ }
+ }
+ else if (cstate->eol_type == EOL_NL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* Process \n */
+ if (c == '\n')
+ {
+ if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ cstate->eol_type = EOL_NL; /* in case not set yet */
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* All other characters are treated as regular data */
+ } /* end of outer loop */
+
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2089,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' requires exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..99fd68a483 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -570,6 +571,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +843,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +927,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +956,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +976,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1261,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4aa8646af7..0d0a3ad7ff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -768,7 +768,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
QUOTE QUOTES
- RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
+ RANGE RAW READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
@@ -3513,6 +3513,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | RAW
+ {
+ $$ = makeDefElem("format", (Node *) makeString("raw"), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17771,6 +17775,7 @@ unreserved_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REASSIGN
| RECURSIVE
@@ -18398,6 +18403,7 @@ bare_label_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REAL
| REASSIGN
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e700fd01b5..04f7548ef4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 899d64ad55..02cd28c750 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -360,6 +360,7 @@ PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quotes", QUOTES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("raw", RAW, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("read", READ, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("real", REAL, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("reassign", REASSIGN, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..b11cabd993 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,126 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 626a437d40..f38c1d6b00 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -88,8 +88,12 @@ LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
+COPY x to stdout (format RAW, delimiter ',');
+ERROR: cannot specify DELIMITER in RAW mode
COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdout (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
@@ -100,6 +104,10 @@ COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x to stdout (format TEXT, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +116,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +128,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +874,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +947,65 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..6333af3a90 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,73 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 3458d287f2..790793b9b8 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -71,19 +71,27 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format RAW, delimiter ',');
COPY x to stdout (format BINARY, null 'x');
+COPY x to stdout (format RAW, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +644,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +716,59 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
--
2.45.1
On Mon, Oct 14, 2024, at 10:51, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 10:07, Joel Jacobson wrote:
Attached is a first draft implementation of the new proposed COPY "raw" format.
The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/
Rebase only.
/Joel
Attachments:
v6-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v6-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnul?= =?UTF-8?Q?l-.patch?="Download
From 08d7a1986cb2369247c825667b34d8c3fe0cd287 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:23:55 +0200
Subject: [PATCH 1/5] Fix thinko in tests for COPY options force_not_null and
force_null.
Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.
In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.
---
src/test/regress/expected/copy2.out | 20 ++++++++++----------
src/test/regress/sql/copy2.sql | 16 ++++++++--------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index ab449fa7b8..3f420db0bc 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -86,9 +86,9 @@ ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
@@ -96,22 +96,22 @@ COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
-COPY x to stdout (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
-COPY x to stdin (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
-COPY x to stdout (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
-COPY x to stdin (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
- ^
+LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 1aa0e41b68..5790057e1c 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,17 +70,17 @@ COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
-COPY x to stdout (format TEXT, force_not_null(a));
-COPY x to stdin (format CSV, force_not_null(a));
-COPY x to stdout (format TEXT, force_null(a));
-COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x from stdin (format TEXT, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
+COPY x from stdin (format TEXT, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
--
2.45.1
v6-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v6-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-?= =?UTF-8?Q?all-.patch?="Download
From 22661dfdcfaedc7216e6ee9ffb7be9091bc644b7 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:35:28 +0200
Subject: [PATCH 2/5] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.
Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.
---
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/expected/copy2.out | 12 ++++++++++++
src/test/regress/sql/copy2.sql | 6 ++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..e93ea3d627 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -805,12 +805,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
+ if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
+ !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
@@ -819,13 +821,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
+ if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 3f420db0bc..626a437d40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -98,16 +98,28 @@ LINE 1: COPY x from stdin (on_error unsupported);
^
COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
+COPY x from stdin (format CSV, force_quote *);
+ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_null *);
+ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 5790057e1c..3458d287f2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -75,11 +75,17 @@ COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
+COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
--
2.45.1
v6-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v6-0003-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From ec61dea196d74777cf0e169f8815822c9a8089ec Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 3/5] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e93ea3d627..effe337229 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -845,7 +846,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -881,7 +882,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,7 +899,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..e700fd01b5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..59433d120e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v6-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patchapplication/octet-stream; name="=?UTF-8?Q?v6-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.p?= =?UTF-8?Q?atch?="Download
From b348513a577f8820fb30d9ebc48bb29d99d2cb19 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:29:51 +0200
Subject: [PATCH 4/5] Reorganize ProcessCopyOptions for clarity and consistent
option handling.
No changes to the function's signature or behavior; the refactoring solely
improves code structure and readability.
Changes:
* Refactored ProcessCopyOptions to improve readability and maintainability
by grouping per-option checks and default assignments into dedicated sections.
This enhances the logical flow and makes it easier to understand how each COPY
option is processed.
* Explicitly set the default format to COPY_FORMAT_TEXT when the FORMAT option
is not specified. Previously, the default was implied due to
zero-initialization, but making it explicit clarifies the default behavior.
* Consistently use boolean specified-variables to determine if an option has
been provided, rather than relying on default values from zero-initialization.
* Added assertions to ensure necessary options are set before performing
dependent checks, explicitly indicating that they have been assigned either
specified or default values.
* Relocated interdependent option validations to a dedicated section for
additional clarity.
---
src/backend/commands/copy.c | 456 ++++++++++++++++++++++--------------
1 file changed, 282 insertions(+), 174 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index effe337229..493ca5f487 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -672,199 +672,272 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ /*
+ * Begin per-option checks and set defaults where necessary
+ */
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- FORMAT option is always allowed; no additional checks needed --- */
- if (opts_out->format == COPY_FORMAT_CSV)
+ /* --- FREEZE option --- */
+ if (freeze_specified)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+ else
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Default is false; no action needed */
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
- if (opts_out->default_print)
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- opts_out->default_print_len = strlen(opts_out->default_print);
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
- if (strchr(opts_out->default_print, '\r') != NULL ||
- strchr(opts_out->default_print, '\n') != NULL)
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY default representation cannot use newline or carriage return")));
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
}
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is false; no action needed */
+ }
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+ else
+ {
+ /* No default action needed */
+ }
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* --- ON_ERROR option --- */
+ if (on_error_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ }
+ else
+ {
+ /* Default is COPY_ON_ERROR_STOP */
+ opts_out->on_error = COPY_ON_ERROR_STOP;
+ }
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
+ /* --- REJECT_LIMIT option --- */
+ if (reject_limit_specified)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ opts_out->default_print_len = strlen(opts_out->default_print);
+
+ if (strchr(opts_out->default_print, '\r') != NULL ||
+ strchr(opts_out->default_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY default representation cannot use newline or carriage return")));
+
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,20 +971,55 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote character in the NULL or DEFAULT strings */
+
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
v6-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patchapplication/octet-stream; name="=?UTF-8?Q?v6-0005-Add-raw-COPY-format-support-for-unstructured-text-da.p?= =?UTF-8?Q?atch?="Download
From ce27697f0b15b6b06a658128ba1e88c11bd0512c Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sun, 13 Oct 2024 21:01:53 +0200
Subject: [PATCH 5/5] Add "raw" COPY format support for unstructured text data.
This commit introduces a new format option to the COPY command, enabling
the import and export of unstructured text data where each line is treated as a
single field without any delimiters.
---
doc/src/sgml/ref/copy.sgml | 98 ++++++++++++-
src/backend/commands/copy.c | 39 +++--
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 204 ++++++++++++++++++++++++++-
src/backend/commands/copyto.c | 70 ++++++++-
src/backend/parser/gram.y | 8 +-
src/include/commands/copy.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy.out | 123 ++++++++++++++++
src/test/regress/expected/copy2.out | 82 ++++++++++-
src/test/regress/sql/copy.sql | 70 +++++++++
src/test/regress/sql/copy2.sql | 67 ++++++++-
12 files changed, 741 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..06ca632ee3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,88 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 493ca5f487..b71161ad99 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -710,6 +712,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "DELIMITER")));
+
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
ereport(ERROR,
@@ -740,11 +748,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
- }
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -754,6 +762,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -761,11 +774,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -925,6 +939,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->quote);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..2528c6f111 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +832,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1112,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1462,6 +1481,138 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+
+ /*
+ * The objective of this loop is to transfer the entire next input line
+ * into line_buf. We only care for detecting newlines (\r and/or \n).
+ * All other characters are treated as regular data.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * For a little extra speed within the loop, we copy input_buf and
+ * input_buf_len into local variables.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+ char c;
+
+ /*
+ * Load more data if needed.
+ */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* OK to fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+ c = copy_input_buf[input_buf_ptr++];
+
+ /* Process \r */
+ if (c == '\r')
+ {
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (cstate->eol_type == EOL_UNKNOWN ||
+ cstate->eol_type == EOL_CRNL)
+ {
+ /*
+ * If need more data, go back to loop top to load it.
+ *
+ * Note that if we are at EOF, c will wind up as '\0' because
+ * of the guaranteed pad of input_buf.
+ */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+ /* get next char */
+ c = copy_input_buf[input_buf_ptr];
+
+ if (c == '\n')
+ {
+ input_buf_ptr++; /* eat newline */
+ cstate->eol_type = EOL_CRNL; /* in case not set yet */
+ }
+ else
+ {
+ if (cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /*
+ * if we got here, it is the first line and we didn't find
+ * \n, so don't consume the peeked character
+ */
+ cstate->eol_type = EOL_CR;
+ }
+ }
+ else if (cstate->eol_type == EOL_NL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* Process \n */
+ if (c == '\n')
+ {
+ if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ cstate->eol_type = EOL_NL; /* in case not set yet */
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* All other characters are treated as regular data */
+ } /* end of outer loop */
+
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2089,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' requires exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..99fd68a483 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -570,6 +571,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +843,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +927,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +956,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +976,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1261,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4aa8646af7..0d0a3ad7ff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -768,7 +768,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
QUOTE QUOTES
- RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
+ RANGE RAW READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
@@ -3513,6 +3513,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | RAW
+ {
+ $$ = makeDefElem("format", (Node *) makeString("raw"), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17771,6 +17775,7 @@ unreserved_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REASSIGN
| RECURSIVE
@@ -18398,6 +18403,7 @@ bare_label_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REAL
| REASSIGN
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e700fd01b5..04f7548ef4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 899d64ad55..02cd28c750 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -360,6 +360,7 @@ PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quotes", QUOTES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("raw", RAW, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("read", READ, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("real", REAL, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("reassign", REASSIGN, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..b11cabd993 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,126 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 626a437d40..f38c1d6b00 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -88,8 +88,12 @@ LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
+COPY x to stdout (format RAW, delimiter ',');
+ERROR: cannot specify DELIMITER in RAW mode
COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdout (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
@@ -100,6 +104,10 @@ COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x to stdout (format TEXT, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +116,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +128,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +874,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +947,65 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..6333af3a90 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,73 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 3458d287f2..790793b9b8 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -71,19 +71,27 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format RAW, delimiter ',');
COPY x to stdout (format BINARY, null 'x');
+COPY x to stdout (format RAW, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +644,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +716,59 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
--
2.45.1
On Mon, Oct 14, 2024, at 21:59, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 10:51, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 10:07, Joel Jacobson wrote:
Attached is a first draft implementation of the new proposed COPY "raw" format.
The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/
I noticed tests were failing in cfbot,
which surprised me since tests were passing locally,
but it was due to me not running the full test suite.
Sorry about the noise. I'm not running the full test suite,
with tap and `meson test --num-processes 32`,
so hopefully I won't cause cfbot failures as often any longer.
(The bug was due to an invalid assert; I wrongly assumed
Assert(opts_out->quote) would be sane inside the
if (opts_out->default_print) branch, but it is of course
wrong, since quote is NULL for the text format,
and the quote check was only performed if format was CSV,
so the assert was unnecessary and invalid).
I've also change the :filename I'm using in copy2.sql
for the raw format tests, to not be the same as in copy.sql,
since I guess this could cause problems with tests
running in parallell.
I've now split the reorganization of ProcessCopyOptions,
into separate easily small steps to make it easier to review.
Each step breaks out the validation of a COPY option,
into its own section, except for DELIMITER and NULL
that had to be changed together in a single commit.
/Joel
Attachments:
v7-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnul?= =?UTF-8?Q?l-.patch?="Download
From b1400cf2f6021e9740d9fc6cbf7aa633eda9597d Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:23:55 +0200
Subject: [PATCH 01/16] Fix thinko in tests for COPY options force_not_null and
force_null.
Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.
In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.
---
src/test/regress/expected/copy2.out | 20 ++++++++++----------
src/test/regress/sql/copy2.sql | 16 ++++++++--------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index ab449fa7b8..3f420db0bc 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -86,9 +86,9 @@ ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
@@ -96,22 +96,22 @@ COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
-COPY x to stdout (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
-COPY x to stdin (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
-COPY x to stdout (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
-COPY x to stdin (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
- ^
+LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 1aa0e41b68..5790057e1c 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,17 +70,17 @@ COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
-COPY x to stdout (format TEXT, force_not_null(a));
-COPY x to stdin (format CSV, force_not_null(a));
-COPY x to stdout (format TEXT, force_null(a));
-COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x from stdin (format TEXT, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
+COPY x from stdin (format TEXT, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
--
2.45.1
v7-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-?= =?UTF-8?Q?all-.patch?="Download
From 67b162ba1e52291c6d3254eb47316aac0dc847cf Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:35:28 +0200
Subject: [PATCH 02/16] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.
Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.
---
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/expected/copy2.out | 12 ++++++++++++
src/test/regress/sql/copy2.sql | 6 ++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..e93ea3d627 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -805,12 +805,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
+ if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
+ !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
@@ -819,13 +821,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
+ if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 3f420db0bc..626a437d40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -98,16 +98,28 @@ LINE 1: COPY x from stdin (on_error unsupported);
^
COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
+COPY x from stdin (format CSV, force_quote *);
+ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_null *);
+ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 5790057e1c..3458d287f2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -75,11 +75,17 @@ COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
+COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
--
2.45.1
v7-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0003-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From a374c260cc82369c96d0781e9d3851c45951bd0d Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 03/16] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e93ea3d627..effe337229 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -845,7 +846,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -881,7 +882,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,7 +899,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..e700fd01b5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..59433d120e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v7-0004-Set-default-format-if-not-specified.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0004-Set-default-format-if-not-specified.patch?="Download
From 78c2db05c50fb43c0af2e9d2326820750ef470a3 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 01:32:56 +0200
Subject: [PATCH 04/16] Set default format if not specified.
---
src/backend/commands/copy.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index effe337229..c068c61bcc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -671,6 +671,14 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
+ /*
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
+ */
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
--
2.45.1
v7-0005-Separate-DELIMITER-and-NULL-option-validation-into-t.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0005-Separate-DELIMITER-and-NULL-option-validation-into-t.p?= =?UTF-8?Q?atch?="Download
From fd08c082034b0d8f54a23bbffd4a6df9fae9d65e Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:00:31 +0200
Subject: [PATCH 05/16] Separate DELIMITER and NULL option validation into
their own sections.
* Move binary format validations under respective option checks
* Introduce specific validations for CSV and TEXT formats
---
src/backend/commands/copy.c | 179 +++++++++++++++++++++---------------
1 file changed, 105 insertions(+), 74 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c068c61bcc..6b2d6e7a57 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -679,34 +679,84 @@ ProcessCopyOptions(ParseState *pstate,
if (!format_specified)
opts_out->format = COPY_FORMAT_TEXT;
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
+
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
+ }
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
-
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
-
if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
@@ -715,25 +765,6 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -745,23 +776,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY default representation cannot use newline or carriage return")));
}
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
/* Check header */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
@@ -781,11 +795,6 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
/* Check escape */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
@@ -845,22 +854,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
"COPY TO")));
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
/* Check freeze */
if (opts_out->freeze && !is_from)
--
2.45.1
v7-0006-Separate-QUOTE-option-validation-into-its-own-sectio.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0006-Separate-QUOTE-option-validation-into-its-own-sectio.p?= =?UTF-8?Q?atch?="Download
From eb00600ce1427a3df3aa167de35f1d52a25af5d2 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:12:20 +0200
Subject: [PATCH 06/16] Separate QUOTE option validation into its own section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b2d6e7a57..873e149c00 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -747,6 +747,26 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -759,8 +779,6 @@ ProcessCopyOptions(ParseState *pstate,
/* Set defaults for omitted options */
if (opts_out->format == COPY_FORMAT_CSV)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
if (!opts_out->escape)
opts_out->escape = opts_out->quote;
}
@@ -783,18 +801,6 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
/* Check escape */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
--
2.45.1
v7-0007-Separate-ESCAPE-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0007-Separate-ESCAPE-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From f948f876be70e1fe5c906c3247c35a17eb8c673e Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:17:21 +0200
Subject: [PATCH 07/16] Separate ESCAPE option validation into its own section.
---
src/backend/commands/copy.c | 39 +++++++++++++++++++------------------
1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 873e149c00..ad897e98f3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -767,6 +767,26 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->quote = "\"";
}
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -776,13 +796,6 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- /* Set defaults for omitted options */
- if (opts_out->format == COPY_FORMAT_CSV)
- {
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
- }
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -801,18 +814,6 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
/* Check force_quote */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
opts_out->force_quote_all))
--
2.45.1
v7-0008-Separate-DEFAULT-option-validation-into-its-own-sect.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0008-Separate-DEFAULT-option-validation-into-its-own-sect.p?= =?UTF-8?Q?atch?="Download
From 460293cc00df040c7819725499748d08787c8da0 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:22:09 +0200
Subject: [PATCH 08/16] Separate DEFAULT option validation into its own
section.
---
src/backend/commands/copy.c | 96 ++++++++++++++++++++-----------------
1 file changed, 52 insertions(+), 44 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index ad897e98f3..cd80548324 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -787,17 +787,18 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -805,8 +806,50 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "DEFAULT",
+ "COPY TO")));
+
+ /* Don't allow the delimiter to appear in the default string. */
+ if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "DEFAULT")));
+
+ /* Don't allow the CSV quote char to appear in the default string. */
+ if (opts_out->format == COPY_FORMAT_CSV &&
+ strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "DEFAULT")));
+
+ /* Don't allow the NULL and DEFAULT string to be the same */
+ if (opts_out->null_print_len == opts_out->default_print_len &&
+ strncmp(opts_out->null_print, opts_out->default_print,
+ opts_out->null_print_len) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("NULL specification and DEFAULT specification cannot be the same")));
+ }
+ else
+ {
+ /* No default for default_print; remains NULL */
}
+ /*
+ * Check for incompatible options (must do these three before inserting
+ * defaults)
+ */
+
/* Check header */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
@@ -909,41 +952,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
- if (opts_out->default_print)
- {
- if (!is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "DEFAULT",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the default string. */
- if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "DEFAULT")));
-
- /* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "DEFAULT")));
-
- /* Don't allow the NULL and DEFAULT string to be the same */
- if (opts_out->null_print_len == opts_out->default_print_len &&
- strncmp(opts_out->null_print, opts_out->default_print,
- opts_out->null_print_len) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("NULL specification and DEFAULT specification cannot be the same")));
- }
/* Check on_error */
if (opts_out->format == COPY_FORMAT_BINARY &&
opts_out->on_error != COPY_ON_ERROR_STOP)
--
2.45.1
v7-0009-Separate-HEADER-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0009-Separate-HEADER-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From bd6d573f9c0b80f1332565d802992aa0ed4783b6 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:29:11 +0200
Subject: [PATCH 09/16] Separate HEADER option validation into its own section.
---
src/backend/commands/copy.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cd80548324..025a4da15d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -845,18 +845,25 @@ ProcessCopyOptions(ParseState *pstate,
/* No default for default_print; remains NULL */
}
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
/* Check force_quote */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
opts_out->force_quote_all))
--
2.45.1
v7-0010-Separate-FORCE_QUOTE-option-validation-into-its-own-.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0010-Separate-FORCE=5FQUOTE-option-validation-into-its-own-?= =?UTF-8?Q?.patch?="Download
From 942f76e595543ace5b912fd050b5961b5b3bbbf2 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:33:35 +0200
Subject: [PATCH 10/16] Separate FORCE_QUOTE option validation into its own
section.
---
src/backend/commands/copy.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 025a4da15d..90c5cb6b0f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -859,26 +859,29 @@ ProcessCopyOptions(ParseState *pstate,
/* Default is no header; no action needed */
}
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
/* Check force_notnull */
if (opts_out->format != COPY_FORMAT_CSV &&
(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
--
2.45.1
v7-0011-Separate-FORCE_NOT_NULL-option-validation-into-its-o.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0011-Separate-FORCE=5FNOT=5FNULL-option-validation-into-its?= =?UTF-8?Q?-o.patch?="Download
From 1284d2499c5d799ae80b1c82338d067d4fbfe5ab Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:34:51 +0200
Subject: [PATCH 11/16] Separate FORCE_NOT_NULL option validation into its own
section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 90c5cb6b0f..57a1c6046a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -877,27 +877,29 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
}
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
-
/* Check force_null */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
--
2.45.1
v7-0012-Separate-FORCE_NULL-option-validation-into-its-own-s.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0012-Separate-FORCE=5FNULL-option-validation-into-its-own-s?= =?UTF-8?Q?.patch?="Download
From 946de7ad6e092d9542394e3765f424b8d996099c Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:36:43 +0200
Subject: [PATCH 12/16] Separate FORCE_NULL option validation into its own
section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 57a1c6046a..b5e224ee6b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -895,27 +895,29 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
/* Checks specific to the CSV and TEXT formats */
if (opts_out->format == COPY_FORMAT_TEXT ||
opts_out->format == COPY_FORMAT_CSV)
--
2.45.1
v7-0013-Separate-FREEZE-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0013-Separate-FREEZE-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From 1d5e0a79ee33d1580f31d673f5b3cdb0046576d2 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:38:39 +0200
Subject: [PATCH 13/16] Separate FREEZE option validation into its own section.
---
src/backend/commands/copy.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b5e224ee6b..484da6fd85 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -913,6 +913,18 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -957,15 +969,6 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
}
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
/* Check on_error */
if (opts_out->format == COPY_FORMAT_BINARY &&
opts_out->on_error != COPY_ON_ERROR_STOP)
--
2.45.1
v7-0014-Separate-ON_ERROR-option-validation-into-its-own-sec.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0014-Separate-ON=5FERROR-option-validation-into-its-own-sec?= =?UTF-8?Q?.patch?="Download
From 3b319c82675f9edf11de138199b7c278d2c4b49d Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:40:27 +0200
Subject: [PATCH 14/16] Separate ON_ERROR option validation into its own
section.
---
src/backend/commands/copy.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 484da6fd85..e631a70577 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -925,6 +925,15 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -969,13 +978,6 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
if (opts_out->reject_limit && !opts_out->on_error)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
--
2.45.1
v7-0015-Separate-REJECT_LIMIT-option-validation-into-its-own.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0015-Separate-REJECT=5FLIMIT-option-validation-into-its-own?= =?UTF-8?Q?.patch?="Download
From a3b802f76ee7f3aa3e27cb10f772fcbf5e1ad518 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:44:22 +0200
Subject: [PATCH 15/16] Separate REJECT_LIMIT option validation into its own
section.
For clarity, explicitly check `on_error != COPY_ON_ERROR_IGNORE`
instead of `!on_error`.
Also update comment for the section of code at the end,
that now is dedicated to additional checks for interdependent options.
---
src/backend/commands/copy.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e631a70577..cde46bbe2b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -934,9 +934,20 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
}
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Additional checks for interdependent options
*/
/* Checks specific to the CSV and TEXT formats */
@@ -977,14 +988,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
-
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
/*
--
2.45.1
v7-0016-Add-raw-COPY-format-support-for-unstructured-text-da.patchapplication/octet-stream; name="=?UTF-8?Q?v7-0016-Add-raw-COPY-format-support-for-unstructured-text-da.p?= =?UTF-8?Q?atch?="Download
From 4ea35a44eec3e27bfb6f5a6fc46707449faf92a7 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 03:03:09 +0200
Subject: [PATCH 16/16] Add "raw" COPY format support for unstructured text
data.
This commit introduces a new format option to the COPY command, enabling
the import and export of unstructured text data where each line is treated as a
single field without any delimiters.
---
doc/src/sgml/ref/copy.sgml | 98 ++++++++++++-
src/backend/commands/copy.c | 45 ++++--
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 204 ++++++++++++++++++++++++++-
src/backend/commands/copyto.c | 70 ++++++++-
src/backend/parser/gram.y | 8 +-
src/include/commands/copy.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy.out | 123 ++++++++++++++++
src/test/regress/expected/copy2.out | 82 ++++++++++-
src/test/regress/sql/copy.sql | 70 +++++++++
src/test/regress/sql/copy2.sql | 67 ++++++++-
12 files changed, 744 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..06ca632ee3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,88 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cde46bbe2b..74d6ebb78d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -688,6 +690,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "DELIMITER")));
+
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
ereport(ERROR,
@@ -718,11 +726,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
- }
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -732,6 +740,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -739,11 +752,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -795,6 +809,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->null_print);
@@ -941,8 +960,8 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
@@ -985,7 +1004,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
+ errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..2528c6f111 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +832,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1112,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1462,6 +1481,138 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+
+ /*
+ * The objective of this loop is to transfer the entire next input line
+ * into line_buf. We only care for detecting newlines (\r and/or \n).
+ * All other characters are treated as regular data.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * For a little extra speed within the loop, we copy input_buf and
+ * input_buf_len into local variables.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+ char c;
+
+ /*
+ * Load more data if needed.
+ */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* OK to fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+ c = copy_input_buf[input_buf_ptr++];
+
+ /* Process \r */
+ if (c == '\r')
+ {
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (cstate->eol_type == EOL_UNKNOWN ||
+ cstate->eol_type == EOL_CRNL)
+ {
+ /*
+ * If need more data, go back to loop top to load it.
+ *
+ * Note that if we are at EOF, c will wind up as '\0' because
+ * of the guaranteed pad of input_buf.
+ */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+ /* get next char */
+ c = copy_input_buf[input_buf_ptr];
+
+ if (c == '\n')
+ {
+ input_buf_ptr++; /* eat newline */
+ cstate->eol_type = EOL_CRNL; /* in case not set yet */
+ }
+ else
+ {
+ if (cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /*
+ * if we got here, it is the first line and we didn't find
+ * \n, so don't consume the peeked character
+ */
+ cstate->eol_type = EOL_CR;
+ }
+ }
+ else if (cstate->eol_type == EOL_NL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* Process \n */
+ if (c == '\n')
+ {
+ if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ cstate->eol_type = EOL_NL; /* in case not set yet */
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* All other characters are treated as regular data */
+ } /* end of outer loop */
+
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2089,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' requires exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..99fd68a483 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -570,6 +571,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +843,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +927,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +956,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +976,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1261,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4aa8646af7..0d0a3ad7ff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -768,7 +768,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
QUOTE QUOTES
- RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
+ RANGE RAW READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
@@ -3513,6 +3513,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | RAW
+ {
+ $$ = makeDefElem("format", (Node *) makeString("raw"), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17771,6 +17775,7 @@ unreserved_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REASSIGN
| RECURSIVE
@@ -18398,6 +18403,7 @@ bare_label_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REAL
| REASSIGN
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e700fd01b5..04f7548ef4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 899d64ad55..02cd28c750 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -360,6 +360,7 @@ PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quotes", QUOTES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("raw", RAW, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("read", READ, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("real", REAL, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("reassign", REASSIGN, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..b11cabd993 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,126 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 626a437d40..ae14cb3d33 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -88,8 +88,12 @@ LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
+COPY x to stdout (format RAW, delimiter ',');
+ERROR: cannot specify DELIMITER in RAW mode
COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdout (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
@@ -100,6 +104,10 @@ COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x to stdout (format TEXT, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +116,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +128,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +874,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +947,65 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..6333af3a90 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,73 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\r'
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\o
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 3458d287f2..f46870e252 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -71,19 +71,27 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format RAW, delimiter ',');
COPY x to stdout (format BINARY, null 'x');
+COPY x to stdout (format RAW, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +644,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +716,59 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+
+-- Test inconsistent newline style
+\o :filename
+\qecho -n line1
+\qecho -n '\r'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\r\n'
+\qecho -n line2
+\qecho -n '\r'
+\qecho -n line3
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\o :filename
+\qecho -n line1
+\qecho -n '\n'
+\qecho -n line2
+\qecho -n '\r\n'
+\o
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
--
2.45.1
On Tue, Oct 15, 2024, at 03:35, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 21:59, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 10:51, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 10:07, Joel Jacobson wrote:
Attached is a first draft implementation of the new proposed COPY "raw" format.
The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/
...
Sorry about the noise. I'm not running the full test suite,
with tap and `meson test --num-processes 32`,
so hopefully I won't cause cfbot failures as often any longer.
Ops, that should have said:
"Sorry about the noise. I'm *now* running the full test suite"
However, I see Windows still failed on copy2.sql,
and I think the reason could be the use of \qecho -n
to create files with inconsistent newline style, e.g.:
\o :filename
\qecho -n line1
\qecho -n '\n'
\qecho -n line2
\qecho -n '\r\n'
\o
COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
Maybe Windows automatically translates \n into \r\n, and vice versa?
If so, this would explain why this test failed on Windows.
Btw, anyone know if it's possible to download the "regression.diffs" file from a the Ci task?
I've downloaded all the crashlog, meason_log, testrun zip files from
https://cirrus-ci.com/task/4564405273231360
but none of these contained the "regression.diffs" mentioned here:
[02:09:42.431] # The differences that caused some tests to fail can be viewed in the file
"C:/cirrus/build/testrun/regress/regress/regression.diffs".
Anyhow, I think I've fixed the problem now, in a cross-platform safe way,
by shipping src/test/regress/data/newline*.data files:
newlines_cr.data
newlines_cr_lr.data
newlines_cr_lr_nolast.data
newlines_cr_nolast.data
newlines_lr.data
newlines_lr_nolast.data
newlines_mixed_1.data
newlines_mixed_2.data
newlines_mixed_3.data
newlines_mixed_4.data
newlines_mixed_5.data
These are then used in copy.sql and copy2.sql, e.g.:
copy.sql:
\set filename :abs_srcdir '/data/newlines_lr.data'
TRUNCATE copy_raw_test;
COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
copy2.sql:
-- Test inconsistent newline style
\set filename :abs_srcdir '/data/newlines_mixed_1.data'
COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
Attaching new version. It's only patch 0016 that has been updated.
/Joel
Attachments:
v8-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnul?= =?UTF-8?Q?l-.patch?="Download
From 6657609bc1c570ebf1922ec281c7182baedac184 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:23:55 +0200
Subject: [PATCH 01/16] Fix thinko in tests for COPY options force_not_null and
force_null.
Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.
In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.
---
src/test/regress/expected/copy2.out | 20 ++++++++++----------
src/test/regress/sql/copy2.sql | 16 ++++++++--------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index ab449fa7b8..3f420db0bc 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -86,9 +86,9 @@ ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
@@ -96,22 +96,22 @@ COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
-COPY x to stdout (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
-COPY x to stdin (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
-COPY x to stdout (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
-COPY x to stdin (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
- ^
+LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 1aa0e41b68..5790057e1c 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,17 +70,17 @@ COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
-COPY x to stdout (format TEXT, force_not_null(a));
-COPY x to stdin (format CSV, force_not_null(a));
-COPY x to stdout (format TEXT, force_null(a));
-COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x from stdin (format TEXT, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
+COPY x from stdin (format TEXT, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
--
2.45.1
v8-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-?= =?UTF-8?Q?all-.patch?="Download
From a227d7c81cd8ba7d17a35ef7dd00a0ed55b0ffe7 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:35:28 +0200
Subject: [PATCH 02/16] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.
Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.
---
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/expected/copy2.out | 12 ++++++++++++
src/test/regress/sql/copy2.sql | 6 ++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..e93ea3d627 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -805,12 +805,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
+ if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
+ !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
@@ -819,13 +821,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
+ if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 3f420db0bc..626a437d40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -98,16 +98,28 @@ LINE 1: COPY x from stdin (on_error unsupported);
^
COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
+COPY x from stdin (format CSV, force_quote *);
+ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_null *);
+ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 5790057e1c..3458d287f2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -75,11 +75,17 @@ COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
+COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
--
2.45.1
v8-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0003-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From 09e66eb6d63707265a72f8c4f80716165ce3d213 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 03/16] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e93ea3d627..effe337229 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -845,7 +846,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -881,7 +882,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,7 +899,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..e700fd01b5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..59433d120e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v8-0004-Set-default-format-if-not-specified.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0004-Set-default-format-if-not-specified.patch?="Download
From da376dc8528f55c3dc4c8a9dc5aa9cce72b89a47 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 01:32:56 +0200
Subject: [PATCH 04/16] Set default format if not specified.
---
src/backend/commands/copy.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index effe337229..c068c61bcc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -671,6 +671,14 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
+ /*
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
+ */
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
--
2.45.1
v8-0005-Separate-DELIMITER-and-NULL-option-validation-into-t.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0005-Separate-DELIMITER-and-NULL-option-validation-into-t.p?= =?UTF-8?Q?atch?="Download
From 2252559dde69a528e712225e8f80287ef24fc500 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:00:31 +0200
Subject: [PATCH 05/16] Separate DELIMITER and NULL option validation into
their own sections.
* Move binary format validations under respective option checks
* Introduce specific validations for CSV and TEXT formats
---
src/backend/commands/copy.c | 179 +++++++++++++++++++++---------------
1 file changed, 105 insertions(+), 74 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c068c61bcc..6b2d6e7a57 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -679,34 +679,84 @@ ProcessCopyOptions(ParseState *pstate,
if (!format_specified)
opts_out->format = COPY_FORMAT_TEXT;
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
+
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
+ }
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
-
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
-
if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
@@ -715,25 +765,6 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -745,23 +776,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY default representation cannot use newline or carriage return")));
}
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
/* Check header */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
@@ -781,11 +795,6 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
/* Check escape */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
@@ -845,22 +854,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
"COPY TO")));
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
/* Check freeze */
if (opts_out->freeze && !is_from)
--
2.45.1
v8-0006-Separate-QUOTE-option-validation-into-its-own-sectio.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0006-Separate-QUOTE-option-validation-into-its-own-sectio.p?= =?UTF-8?Q?atch?="Download
From 26a849eabd64f98ea19b45004855d8afdbd51a96 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:12:20 +0200
Subject: [PATCH 06/16] Separate QUOTE option validation into its own section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b2d6e7a57..873e149c00 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -747,6 +747,26 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -759,8 +779,6 @@ ProcessCopyOptions(ParseState *pstate,
/* Set defaults for omitted options */
if (opts_out->format == COPY_FORMAT_CSV)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
if (!opts_out->escape)
opts_out->escape = opts_out->quote;
}
@@ -783,18 +801,6 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
/* Check escape */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
--
2.45.1
v8-0007-Separate-ESCAPE-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0007-Separate-ESCAPE-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From 6e4e4f3d055edc70393d83c1626b6c98f0af9a6f Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:17:21 +0200
Subject: [PATCH 07/16] Separate ESCAPE option validation into its own section.
---
src/backend/commands/copy.c | 39 +++++++++++++++++++------------------
1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 873e149c00..ad897e98f3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -767,6 +767,26 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->quote = "\"";
}
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -776,13 +796,6 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- /* Set defaults for omitted options */
- if (opts_out->format == COPY_FORMAT_CSV)
- {
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
- }
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -801,18 +814,6 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
/* Check force_quote */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
opts_out->force_quote_all))
--
2.45.1
v8-0008-Separate-DEFAULT-option-validation-into-its-own-sect.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0008-Separate-DEFAULT-option-validation-into-its-own-sect.p?= =?UTF-8?Q?atch?="Download
From 50828d2c5ba1a28b85a09eefdb9ede87d0d5a991 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:22:09 +0200
Subject: [PATCH 08/16] Separate DEFAULT option validation into its own
section.
---
src/backend/commands/copy.c | 96 ++++++++++++++++++++-----------------
1 file changed, 52 insertions(+), 44 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index ad897e98f3..cd80548324 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -787,17 +787,18 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -805,8 +806,50 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "DEFAULT",
+ "COPY TO")));
+
+ /* Don't allow the delimiter to appear in the default string. */
+ if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "DEFAULT")));
+
+ /* Don't allow the CSV quote char to appear in the default string. */
+ if (opts_out->format == COPY_FORMAT_CSV &&
+ strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "DEFAULT")));
+
+ /* Don't allow the NULL and DEFAULT string to be the same */
+ if (opts_out->null_print_len == opts_out->default_print_len &&
+ strncmp(opts_out->null_print, opts_out->default_print,
+ opts_out->null_print_len) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("NULL specification and DEFAULT specification cannot be the same")));
+ }
+ else
+ {
+ /* No default for default_print; remains NULL */
}
+ /*
+ * Check for incompatible options (must do these three before inserting
+ * defaults)
+ */
+
/* Check header */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
@@ -909,41 +952,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
- if (opts_out->default_print)
- {
- if (!is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "DEFAULT",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the default string. */
- if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "DEFAULT")));
-
- /* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "DEFAULT")));
-
- /* Don't allow the NULL and DEFAULT string to be the same */
- if (opts_out->null_print_len == opts_out->default_print_len &&
- strncmp(opts_out->null_print, opts_out->default_print,
- opts_out->null_print_len) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("NULL specification and DEFAULT specification cannot be the same")));
- }
/* Check on_error */
if (opts_out->format == COPY_FORMAT_BINARY &&
opts_out->on_error != COPY_ON_ERROR_STOP)
--
2.45.1
v8-0009-Separate-HEADER-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0009-Separate-HEADER-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From 482d0703409d497391856f4820a9ec1b5af1d314 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:29:11 +0200
Subject: [PATCH 09/16] Separate HEADER option validation into its own section.
---
src/backend/commands/copy.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cd80548324..025a4da15d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -845,18 +845,25 @@ ProcessCopyOptions(ParseState *pstate,
/* No default for default_print; remains NULL */
}
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
/* Check force_quote */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
opts_out->force_quote_all))
--
2.45.1
v8-0010-Separate-FORCE_QUOTE-option-validation-into-its-own-.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0010-Separate-FORCE=5FQUOTE-option-validation-into-its-own-?= =?UTF-8?Q?.patch?="Download
From 7128e8773a0afbdb011f3780016eee0636bba066 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:33:35 +0200
Subject: [PATCH 10/16] Separate FORCE_QUOTE option validation into its own
section.
---
src/backend/commands/copy.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 025a4da15d..90c5cb6b0f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -859,26 +859,29 @@ ProcessCopyOptions(ParseState *pstate,
/* Default is no header; no action needed */
}
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
/* Check force_notnull */
if (opts_out->format != COPY_FORMAT_CSV &&
(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
--
2.45.1
v8-0011-Separate-FORCE_NOT_NULL-option-validation-into-its-o.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0011-Separate-FORCE=5FNOT=5FNULL-option-validation-into-its?= =?UTF-8?Q?-o.patch?="Download
From fb96c46adaed5590650dfbbec40b47e9de382568 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:34:51 +0200
Subject: [PATCH 11/16] Separate FORCE_NOT_NULL option validation into its own
section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 90c5cb6b0f..57a1c6046a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -877,27 +877,29 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
}
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
-
/* Check force_null */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
--
2.45.1
v8-0012-Separate-FORCE_NULL-option-validation-into-its-own-s.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0012-Separate-FORCE=5FNULL-option-validation-into-its-own-s?= =?UTF-8?Q?.patch?="Download
From f68caa202341fbdabe2a8a6858f471b3d06cfbaf Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:36:43 +0200
Subject: [PATCH 12/16] Separate FORCE_NULL option validation into its own
section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 57a1c6046a..b5e224ee6b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -895,27 +895,29 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
/* Checks specific to the CSV and TEXT formats */
if (opts_out->format == COPY_FORMAT_TEXT ||
opts_out->format == COPY_FORMAT_CSV)
--
2.45.1
v8-0013-Separate-FREEZE-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0013-Separate-FREEZE-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From e300aa532a8544a60dd67e8342aef3311c6a1601 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:38:39 +0200
Subject: [PATCH 13/16] Separate FREEZE option validation into its own section.
---
src/backend/commands/copy.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b5e224ee6b..484da6fd85 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -913,6 +913,18 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -957,15 +969,6 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
}
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
/* Check on_error */
if (opts_out->format == COPY_FORMAT_BINARY &&
opts_out->on_error != COPY_ON_ERROR_STOP)
--
2.45.1
v8-0014-Separate-ON_ERROR-option-validation-into-its-own-sec.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0014-Separate-ON=5FERROR-option-validation-into-its-own-sec?= =?UTF-8?Q?.patch?="Download
From 3e0fb606e32a1f5a71fb5f451f8abe649469de55 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:40:27 +0200
Subject: [PATCH 14/16] Separate ON_ERROR option validation into its own
section.
---
src/backend/commands/copy.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 484da6fd85..e631a70577 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -925,6 +925,15 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -969,13 +978,6 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
if (opts_out->reject_limit && !opts_out->on_error)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
--
2.45.1
v8-0015-Separate-REJECT_LIMIT-option-validation-into-its-own.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0015-Separate-REJECT=5FLIMIT-option-validation-into-its-own?= =?UTF-8?Q?.patch?="Download
From d2da8d3d7ac27515d91e597cb06336bbccfad1fa Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:44:22 +0200
Subject: [PATCH 15/16] Separate REJECT_LIMIT option validation into its own
section.
For clarity, explicitly check `on_error != COPY_ON_ERROR_IGNORE`
instead of `!on_error`.
Also update comment for the section of code at the end,
that now is dedicated to additional checks for interdependent options.
---
src/backend/commands/copy.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e631a70577..cde46bbe2b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -934,9 +934,20 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
}
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Additional checks for interdependent options
*/
/* Checks specific to the CSV and TEXT formats */
@@ -977,14 +988,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
-
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
/*
--
2.45.1
v8-0016-Add-raw-COPY-format-support-for-unstructured-text-da.patchapplication/octet-stream; name="=?UTF-8?Q?v8-0016-Add-raw-COPY-format-support-for-unstructured-text-da.p?= =?UTF-8?Q?atch?="Download
From ba28df5b6bb2dfe6028a34e119d6e916582f0d50 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 03:03:09 +0200
Subject: [PATCH 16/16] Add "raw" COPY format support for unstructured text
data.
This commit introduces a new format option to the COPY command, enabling
the import and export of unstructured text data where each line is treated as a
single field without any delimiters.
---
doc/src/sgml/ref/copy.sgml | 98 ++++++++-
src/backend/commands/copy.c | 45 ++--
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 204 +++++++++++++++++-
src/backend/commands/copyto.c | 70 +++++-
src/backend/parser/gram.y | 8 +-
src/include/commands/copy.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/regress/data/newlines_cr.data | 1 +
src/test/regress/data/newlines_cr_lr.data | 2 +
.../regress/data/newlines_cr_lr_nolast.data | 2 +
src/test/regress/data/newlines_cr_nolast.data | 1 +
src/test/regress/data/newlines_lr.data | 2 +
src/test/regress/data/newlines_lr_nolast.data | 2 +
src/test/regress/data/newlines_mixed_1.data | 1 +
src/test/regress/data/newlines_mixed_2.data | 2 +
src/test/regress/data/newlines_mixed_3.data | 2 +
src/test/regress/data/newlines_mixed_4.data | 2 +
src/test/regress/data/newlines_mixed_5.data | 2 +
src/test/regress/expected/copy.out | 96 +++++++++
src/test/regress/expected/copy2.out | 57 ++++-
src/test/regress/sql/copy.sql | 43 ++++
src/test/regress/sql/copy2.sql | 43 +++-
23 files changed, 660 insertions(+), 32 deletions(-)
create mode 100644 src/test/regress/data/newlines_cr.data
create mode 100644 src/test/regress/data/newlines_cr_lr.data
create mode 100644 src/test/regress/data/newlines_cr_lr_nolast.data
create mode 100644 src/test/regress/data/newlines_cr_nolast.data
create mode 100644 src/test/regress/data/newlines_lr.data
create mode 100644 src/test/regress/data/newlines_lr_nolast.data
create mode 100644 src/test/regress/data/newlines_mixed_1.data
create mode 100644 src/test/regress/data/newlines_mixed_2.data
create mode 100644 src/test/regress/data/newlines_mixed_3.data
create mode 100644 src/test/regress/data/newlines_mixed_4.data
create mode 100644 src/test/regress/data/newlines_mixed_5.data
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..06ca632ee3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,88 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cde46bbe2b..74d6ebb78d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -688,6 +690,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "DELIMITER")));
+
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
ereport(ERROR,
@@ -718,11 +726,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
- }
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -732,6 +740,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -739,11 +752,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -795,6 +809,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->null_print);
@@ -941,8 +960,8 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
@@ -985,7 +1004,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
+ errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..2528c6f111 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +832,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1112,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1462,6 +1481,138 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+
+ /*
+ * The objective of this loop is to transfer the entire next input line
+ * into line_buf. We only care for detecting newlines (\r and/or \n).
+ * All other characters are treated as regular data.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * For a little extra speed within the loop, we copy input_buf and
+ * input_buf_len into local variables.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+ char c;
+
+ /*
+ * Load more data if needed.
+ */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* OK to fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+ c = copy_input_buf[input_buf_ptr++];
+
+ /* Process \r */
+ if (c == '\r')
+ {
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (cstate->eol_type == EOL_UNKNOWN ||
+ cstate->eol_type == EOL_CRNL)
+ {
+ /*
+ * If need more data, go back to loop top to load it.
+ *
+ * Note that if we are at EOF, c will wind up as '\0' because
+ * of the guaranteed pad of input_buf.
+ */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+ /* get next char */
+ c = copy_input_buf[input_buf_ptr];
+
+ if (c == '\n')
+ {
+ input_buf_ptr++; /* eat newline */
+ cstate->eol_type = EOL_CRNL; /* in case not set yet */
+ }
+ else
+ {
+ if (cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /*
+ * if we got here, it is the first line and we didn't find
+ * \n, so don't consume the peeked character
+ */
+ cstate->eol_type = EOL_CR;
+ }
+ }
+ else if (cstate->eol_type == EOL_NL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* Process \n */
+ if (c == '\n')
+ {
+ if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ cstate->eol_type = EOL_NL; /* in case not set yet */
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* All other characters are treated as regular data */
+ } /* end of outer loop */
+
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2089,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' requires exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..99fd68a483 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -570,6 +571,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +843,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +927,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +956,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +976,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1261,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4aa8646af7..0d0a3ad7ff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -768,7 +768,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
QUOTE QUOTES
- RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
+ RANGE RAW READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
@@ -3513,6 +3513,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | RAW
+ {
+ $$ = makeDefElem("format", (Node *) makeString("raw"), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17771,6 +17775,7 @@ unreserved_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REASSIGN
| RECURSIVE
@@ -18398,6 +18403,7 @@ bare_label_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REAL
| REASSIGN
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e700fd01b5..04f7548ef4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 899d64ad55..02cd28c750 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -360,6 +360,7 @@ PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quotes", QUOTES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("raw", RAW, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("read", READ, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("real", REAL, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("reassign", REASSIGN, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/data/newlines_cr.data b/src/test/regress/data/newlines_cr.data
new file mode 100644
index 0000000000..5397a14fca
--- /dev/null
+++ b/src/test/regress/data/newlines_cr.data
@@ -0,0 +1 @@
+line1
line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_cr_lr.data b/src/test/regress/data/newlines_cr_lr.data
new file mode 100644
index 0000000000..8561d5d6dc
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_lr.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_cr_lr_nolast.data b/src/test/regress/data/newlines_cr_lr_nolast.data
new file mode 100644
index 0000000000..3a1bd7a527
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_lr_nolast.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_cr_nolast.data b/src/test/regress/data/newlines_cr_nolast.data
new file mode 100644
index 0000000000..d9dce6c5ea
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_nolast.data
@@ -0,0 +1 @@
+line1
line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_lr.data b/src/test/regress/data/newlines_lr.data
new file mode 100644
index 0000000000..c0d0fb45c3
--- /dev/null
+++ b/src/test/regress/data/newlines_lr.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_lr_nolast.data b/src/test/regress/data/newlines_lr_nolast.data
new file mode 100644
index 0000000000..f8be7bb828
--- /dev/null
+++ b/src/test/regress/data/newlines_lr_nolast.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_1.data b/src/test/regress/data/newlines_mixed_1.data
new file mode 100644
index 0000000000..d20e511549
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_1.data
@@ -0,0 +1 @@
+line1
line2
diff --git a/src/test/regress/data/newlines_mixed_2.data b/src/test/regress/data/newlines_mixed_2.data
new file mode 100644
index 0000000000..fe03b64cc3
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_2.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_mixed_3.data b/src/test/regress/data/newlines_mixed_3.data
new file mode 100644
index 0000000000..d2772944d6
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_3.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_4.data b/src/test/regress/data/newlines_mixed_4.data
new file mode 100644
index 0000000000..7afb2406f0
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_4.data
@@ -0,0 +1,2 @@
+line1
+line2
line3
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_5.data b/src/test/regress/data/newlines_mixed_5.data
new file mode 100644
index 0000000000..658b3593ea
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_5.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..310d254bda 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,99 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ | f
+ | f
+ test | f
+(8 rows)
+
+\set filename :abs_srcdir '/data/newlines_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 626a437d40..34bf06390b 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -88,8 +88,12 @@ LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
+COPY x to stdout (format RAW, delimiter ',');
+ERROR: cannot specify DELIMITER in RAW mode
COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdout (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
@@ -100,6 +104,10 @@ COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x to stdout (format TEXT, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +116,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +128,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +874,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +947,40 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+-- Test inconsistent newline style
+\set filename :abs_srcdir '/data/newlines_mixed_1.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_2.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_3.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_4.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_5.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..80ff618c74 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,46 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), (E'\n'), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 3458d287f2..56367234bf 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -71,19 +71,27 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format RAW, delimiter ',');
COPY x to stdout (format BINARY, null 'x');
+COPY x to stdout (format RAW, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +644,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +716,35 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+
+-- Test inconsistent newline style
+\set filename :abs_srcdir '/data/newlines_mixed_1.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_2.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_3.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_4.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_5.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
--
2.45.1
On Tue, Oct 15, 2024, at 09:54, Joel Jacobson wrote:
On Tue, Oct 15, 2024, at 03:35, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 21:59, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 10:51, Joel Jacobson wrote:
On Mon, Oct 14, 2024, at 10:07, Joel Jacobson wrote:
Attached is a first draft implementation of the new proposed COPY "raw" format.
The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/
...
Btw, anyone know if it's possible to download the "regression.diffs"
file from a the Ci task?
Thanks @Matthias for the help,
found it at https://api.cirrus-ci.com/v1/artifact/task/5938219148115968/testrun/build/testrun/regress/regress/regression.diffs
The Windows problem was due to a test that inserted a "\n" as a text column,
to test that it should be parsed as an extra newline.
I've removed that part now from the test, since it's covered by the other tests,
with the hard-coded files.
/Joel
Attachments:
v9-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnul?= =?UTF-8?Q?l-.patch?="Download
From 435ccbd298dd1e6e14e272ff40004569df00e1c5 Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:23:55 +0200
Subject: [PATCH 01/16] Fix thinko in tests for COPY options force_not_null and
force_null.
Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.
In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.
---
src/test/regress/expected/copy2.out | 20 ++++++++++----------
src/test/regress/sql/copy2.sql | 16 ++++++++--------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index ab449fa7b8..3f420db0bc 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -86,9 +86,9 @@ ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
@@ -96,22 +96,22 @@ COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
-COPY x to stdout (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
-COPY x to stdin (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
-COPY x to stdout (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
-COPY x to stdin (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
- ^
+LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 1aa0e41b68..5790057e1c 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,17 +70,17 @@ COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
-COPY x to stdout (format TEXT, force_not_null(a));
-COPY x to stdin (format CSV, force_not_null(a));
-COPY x to stdout (format TEXT, force_null(a));
-COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x from stdin (format TEXT, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
+COPY x from stdin (format TEXT, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
--
2.45.1
v9-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-?= =?UTF-8?Q?all-.patch?="Download
From 209cb051b7c4b902c8f12f00c5e2ff03f1ab07dd Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:35:28 +0200
Subject: [PATCH 02/16] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.
Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.
---
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/expected/copy2.out | 12 ++++++++++++
src/test/regress/sql/copy2.sql | 6 ++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..e93ea3d627 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -805,12 +805,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
+ if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
+ !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
@@ -819,13 +821,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
+ if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 3f420db0bc..626a437d40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -98,16 +98,28 @@ LINE 1: COPY x from stdin (on_error unsupported);
^
COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
+COPY x from stdin (format CSV, force_quote *);
+ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_null *);
+ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 5790057e1c..3458d287f2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -75,11 +75,17 @@ COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
+COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
--
2.45.1
v9-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0003-Replace-binary-flags-binary-and-csv=5Fmode-with-format?= =?UTF-8?Q?.patch?="Download
From 953cf07da9365ba7f8f8c5db57d6cfb425a02633 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 03/16] Replace binary flags `binary` and `csv_mode` with
`format` enum.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e93ea3d627..effe337229 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -845,7 +846,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -881,7 +882,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,7 +899,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..e700fd01b5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..59433d120e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v9-0004-Set-default-format-if-not-specified.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0004-Set-default-format-if-not-specified.patch?="Download
From 04a0d2dc78a715ebe54345287a5b335e9f7a52b0 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 01:32:56 +0200
Subject: [PATCH 04/16] Set default format if not specified.
---
src/backend/commands/copy.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index effe337229..c068c61bcc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -671,6 +671,14 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
+ /*
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
+ */
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
--
2.45.1
v9-0005-Separate-DELIMITER-and-NULL-option-validation-into-t.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0005-Separate-DELIMITER-and-NULL-option-validation-into-t.p?= =?UTF-8?Q?atch?="Download
From 837d35ae112ba3bb02e008d660e9d7aba2bc03b4 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:00:31 +0200
Subject: [PATCH 05/16] Separate DELIMITER and NULL option validation into
their own sections.
* Move binary format validations under respective option checks
* Introduce specific validations for CSV and TEXT formats
---
src/backend/commands/copy.c | 179 +++++++++++++++++++++---------------
1 file changed, 105 insertions(+), 74 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c068c61bcc..6b2d6e7a57 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -679,34 +679,84 @@ ProcessCopyOptions(ParseState *pstate,
if (!format_specified)
opts_out->format = COPY_FORMAT_TEXT;
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
+
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
+ }
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
-
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
-
if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
@@ -715,25 +765,6 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -745,23 +776,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY default representation cannot use newline or carriage return")));
}
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
/* Check header */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
@@ -781,11 +795,6 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
/* Check escape */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
@@ -845,22 +854,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
"COPY TO")));
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
/* Check freeze */
if (opts_out->freeze && !is_from)
--
2.45.1
v9-0006-Separate-QUOTE-option-validation-into-its-own-sectio.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0006-Separate-QUOTE-option-validation-into-its-own-sectio.p?= =?UTF-8?Q?atch?="Download
From e79bdce67c0eb500142e41b4d829826f1956ad9a Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:12:20 +0200
Subject: [PATCH 06/16] Separate QUOTE option validation into its own section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b2d6e7a57..873e149c00 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -747,6 +747,26 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -759,8 +779,6 @@ ProcessCopyOptions(ParseState *pstate,
/* Set defaults for omitted options */
if (opts_out->format == COPY_FORMAT_CSV)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
if (!opts_out->escape)
opts_out->escape = opts_out->quote;
}
@@ -783,18 +801,6 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
/* Check escape */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
--
2.45.1
v9-0007-Separate-ESCAPE-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0007-Separate-ESCAPE-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From 03c122c31ea2faee1bb608371f1863d0cd9b5f50 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:17:21 +0200
Subject: [PATCH 07/16] Separate ESCAPE option validation into its own section.
---
src/backend/commands/copy.c | 39 +++++++++++++++++++------------------
1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 873e149c00..ad897e98f3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -767,6 +767,26 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->quote = "\"";
}
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -776,13 +796,6 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- /* Set defaults for omitted options */
- if (opts_out->format == COPY_FORMAT_CSV)
- {
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
- }
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -801,18 +814,6 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
/* Check force_quote */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
opts_out->force_quote_all))
--
2.45.1
v9-0008-Separate-DEFAULT-option-validation-into-its-own-sect.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0008-Separate-DEFAULT-option-validation-into-its-own-sect.p?= =?UTF-8?Q?atch?="Download
From 35f7b3047287e94c3ee02b5583da93d376ed9ebf Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:22:09 +0200
Subject: [PATCH 08/16] Separate DEFAULT option validation into its own
section.
---
src/backend/commands/copy.c | 96 ++++++++++++++++++++-----------------
1 file changed, 52 insertions(+), 44 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index ad897e98f3..cd80548324 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -787,17 +787,18 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -805,8 +806,50 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "DEFAULT",
+ "COPY TO")));
+
+ /* Don't allow the delimiter to appear in the default string. */
+ if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "DEFAULT")));
+
+ /* Don't allow the CSV quote char to appear in the default string. */
+ if (opts_out->format == COPY_FORMAT_CSV &&
+ strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "DEFAULT")));
+
+ /* Don't allow the NULL and DEFAULT string to be the same */
+ if (opts_out->null_print_len == opts_out->default_print_len &&
+ strncmp(opts_out->null_print, opts_out->default_print,
+ opts_out->null_print_len) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("NULL specification and DEFAULT specification cannot be the same")));
+ }
+ else
+ {
+ /* No default for default_print; remains NULL */
}
+ /*
+ * Check for incompatible options (must do these three before inserting
+ * defaults)
+ */
+
/* Check header */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
@@ -909,41 +952,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
- if (opts_out->default_print)
- {
- if (!is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "DEFAULT",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the default string. */
- if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "DEFAULT")));
-
- /* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "DEFAULT")));
-
- /* Don't allow the NULL and DEFAULT string to be the same */
- if (opts_out->null_print_len == opts_out->default_print_len &&
- strncmp(opts_out->null_print, opts_out->default_print,
- opts_out->null_print_len) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("NULL specification and DEFAULT specification cannot be the same")));
- }
/* Check on_error */
if (opts_out->format == COPY_FORMAT_BINARY &&
opts_out->on_error != COPY_ON_ERROR_STOP)
--
2.45.1
v9-0009-Separate-HEADER-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0009-Separate-HEADER-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From 2c05d4a955058526a130a5306e792d6e24ad32e3 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:29:11 +0200
Subject: [PATCH 09/16] Separate HEADER option validation into its own section.
---
src/backend/commands/copy.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cd80548324..025a4da15d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -845,18 +845,25 @@ ProcessCopyOptions(ParseState *pstate,
/* No default for default_print; remains NULL */
}
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
/* Check force_quote */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
opts_out->force_quote_all))
--
2.45.1
v9-0010-Separate-FORCE_QUOTE-option-validation-into-its-own-.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0010-Separate-FORCE=5FQUOTE-option-validation-into-its-own-?= =?UTF-8?Q?.patch?="Download
From 9efb0eeb20cc7e628d7fa3fb722e4ed856016dc1 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:33:35 +0200
Subject: [PATCH 10/16] Separate FORCE_QUOTE option validation into its own
section.
---
src/backend/commands/copy.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 025a4da15d..90c5cb6b0f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -859,26 +859,29 @@ ProcessCopyOptions(ParseState *pstate,
/* Default is no header; no action needed */
}
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
/* Check force_notnull */
if (opts_out->format != COPY_FORMAT_CSV &&
(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
--
2.45.1
v9-0011-Separate-FORCE_NOT_NULL-option-validation-into-its-o.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0011-Separate-FORCE=5FNOT=5FNULL-option-validation-into-its?= =?UTF-8?Q?-o.patch?="Download
From f5004243ca15d098e03015fd90f983b79e6c3bc0 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:34:51 +0200
Subject: [PATCH 11/16] Separate FORCE_NOT_NULL option validation into its own
section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 90c5cb6b0f..57a1c6046a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -877,27 +877,29 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
}
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
-
/* Check force_null */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
--
2.45.1
v9-0012-Separate-FORCE_NULL-option-validation-into-its-own-s.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0012-Separate-FORCE=5FNULL-option-validation-into-its-own-s?= =?UTF-8?Q?.patch?="Download
From 0169081702e99fcaf66600d692f47fd082f50dd2 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:36:43 +0200
Subject: [PATCH 12/16] Separate FORCE_NULL option validation into its own
section.
---
src/backend/commands/copy.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 57a1c6046a..b5e224ee6b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -895,27 +895,29 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
/* Checks specific to the CSV and TEXT formats */
if (opts_out->format == COPY_FORMAT_TEXT ||
opts_out->format == COPY_FORMAT_CSV)
--
2.45.1
v9-0013-Separate-FREEZE-option-validation-into-its-own-secti.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0013-Separate-FREEZE-option-validation-into-its-own-secti.p?= =?UTF-8?Q?atch?="Download
From f729ba9d1aae99b83caa1e27522aa5d634d464b1 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:38:39 +0200
Subject: [PATCH 13/16] Separate FREEZE option validation into its own section.
---
src/backend/commands/copy.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b5e224ee6b..484da6fd85 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -913,6 +913,18 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -957,15 +969,6 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
}
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
/* Check on_error */
if (opts_out->format == COPY_FORMAT_BINARY &&
opts_out->on_error != COPY_ON_ERROR_STOP)
--
2.45.1
v9-0014-Separate-ON_ERROR-option-validation-into-its-own-sec.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0014-Separate-ON=5FERROR-option-validation-into-its-own-sec?= =?UTF-8?Q?.patch?="Download
From a1924e7feea3bb63026e918c5e002c71df47b3ed Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:40:27 +0200
Subject: [PATCH 14/16] Separate ON_ERROR option validation into its own
section.
---
src/backend/commands/copy.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 484da6fd85..e631a70577 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -925,6 +925,15 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
/*
* Check for incompatible options (must do these three before inserting
* defaults)
@@ -969,13 +978,6 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
if (opts_out->reject_limit && !opts_out->on_error)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
--
2.45.1
v9-0015-Separate-REJECT_LIMIT-option-validation-into-its-own.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0015-Separate-REJECT=5FLIMIT-option-validation-into-its-own?= =?UTF-8?Q?.patch?="Download
From dfe4c27444ccbba9b05b853c04981c3971ba86c4 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 02:44:22 +0200
Subject: [PATCH 15/16] Separate REJECT_LIMIT option validation into its own
section.
For clarity, explicitly check `on_error != COPY_ON_ERROR_IGNORE`
instead of `!on_error`.
Also update comment for the section of code at the end,
that now is dedicated to additional checks for interdependent options.
---
src/backend/commands/copy.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e631a70577..cde46bbe2b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -934,9 +934,20 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
}
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Additional checks for interdependent options
*/
/* Checks specific to the CSV and TEXT formats */
@@ -977,14 +988,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
-
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
/*
--
2.45.1
v9-0016-Add-raw-COPY-format-support-for-unstructured-text-da.patchapplication/octet-stream; name="=?UTF-8?Q?v9-0016-Add-raw-COPY-format-support-for-unstructured-text-da.p?= =?UTF-8?Q?atch?="Download
From 2e8e49bc9cd3bd346358ad97bef5bb8cd5bb4a26 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 03:03:09 +0200
Subject: [PATCH 16/16] Add "raw" COPY format support for unstructured text
data.
This commit introduces a new format option to the COPY command, enabling
the import and export of unstructured text data where each line is treated as a
single field without any delimiters.
---
doc/src/sgml/ref/copy.sgml | 98 ++++++++-
src/backend/commands/copy.c | 45 ++--
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 204 +++++++++++++++++-
src/backend/commands/copyto.c | 70 +++++-
src/backend/parser/gram.y | 8 +-
src/include/commands/copy.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/regress/data/newlines_cr.data | 1 +
src/test/regress/data/newlines_cr_lr.data | 2 +
.../regress/data/newlines_cr_lr_nolast.data | 2 +
src/test/regress/data/newlines_cr_nolast.data | 1 +
src/test/regress/data/newlines_lr.data | 2 +
src/test/regress/data/newlines_lr_nolast.data | 2 +
src/test/regress/data/newlines_mixed_1.data | 1 +
src/test/regress/data/newlines_mixed_2.data | 2 +
src/test/regress/data/newlines_mixed_3.data | 2 +
src/test/regress/data/newlines_mixed_4.data | 2 +
src/test/regress/data/newlines_mixed_5.data | 2 +
src/test/regress/expected/copy.out | 92 ++++++++
src/test/regress/expected/copy2.out | 57 ++++-
src/test/regress/sql/copy.sql | 43 ++++
src/test/regress/sql/copy2.sql | 43 +++-
23 files changed, 656 insertions(+), 32 deletions(-)
create mode 100644 src/test/regress/data/newlines_cr.data
create mode 100644 src/test/regress/data/newlines_cr_lr.data
create mode 100644 src/test/regress/data/newlines_cr_lr_nolast.data
create mode 100644 src/test/regress/data/newlines_cr_nolast.data
create mode 100644 src/test/regress/data/newlines_lr.data
create mode 100644 src/test/regress/data/newlines_lr_nolast.data
create mode 100644 src/test/regress/data/newlines_mixed_1.data
create mode 100644 src/test/regress/data/newlines_mixed_2.data
create mode 100644 src/test/regress/data/newlines_mixed_3.data
create mode 100644 src/test/regress/data/newlines_mixed_4.data
create mode 100644 src/test/regress/data/newlines_mixed_5.data
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..06ca632ee3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,88 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cde46bbe2b..74d6ebb78d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -688,6 +690,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "DELIMITER")));
+
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
ereport(ERROR,
@@ -718,11 +726,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
- }
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -732,6 +740,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -739,11 +752,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -795,6 +809,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->null_print);
@@ -941,8 +960,8 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
@@ -985,7 +1004,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
+ errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..2528c6f111 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +832,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1112,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1462,6 +1481,138 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+
+ /*
+ * The objective of this loop is to transfer the entire next input line
+ * into line_buf. We only care for detecting newlines (\r and/or \n).
+ * All other characters are treated as regular data.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * For a little extra speed within the loop, we copy input_buf and
+ * input_buf_len into local variables.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+ char c;
+
+ /*
+ * Load more data if needed.
+ */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* OK to fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+ c = copy_input_buf[input_buf_ptr++];
+
+ /* Process \r */
+ if (c == '\r')
+ {
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (cstate->eol_type == EOL_UNKNOWN ||
+ cstate->eol_type == EOL_CRNL)
+ {
+ /*
+ * If need more data, go back to loop top to load it.
+ *
+ * Note that if we are at EOF, c will wind up as '\0' because
+ * of the guaranteed pad of input_buf.
+ */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+ /* get next char */
+ c = copy_input_buf[input_buf_ptr];
+
+ if (c == '\n')
+ {
+ input_buf_ptr++; /* eat newline */
+ cstate->eol_type = EOL_CRNL; /* in case not set yet */
+ }
+ else
+ {
+ if (cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /*
+ * if we got here, it is the first line and we didn't find
+ * \n, so don't consume the peeked character
+ */
+ cstate->eol_type = EOL_CR;
+ }
+ }
+ else if (cstate->eol_type == EOL_NL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* Process \n */
+ if (c == '\n')
+ {
+ if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ cstate->eol_type = EOL_NL; /* in case not set yet */
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* All other characters are treated as regular data */
+ } /* end of outer loop */
+
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2089,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' requires exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..99fd68a483 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -570,6 +571,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +843,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +927,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +956,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +976,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1261,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4aa8646af7..0d0a3ad7ff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -768,7 +768,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
QUOTE QUOTES
- RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
+ RANGE RAW READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
@@ -3513,6 +3513,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | RAW
+ {
+ $$ = makeDefElem("format", (Node *) makeString("raw"), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17771,6 +17775,7 @@ unreserved_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REASSIGN
| RECURSIVE
@@ -18398,6 +18403,7 @@ bare_label_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REAL
| REASSIGN
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e700fd01b5..04f7548ef4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 899d64ad55..02cd28c750 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -360,6 +360,7 @@ PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quotes", QUOTES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("raw", RAW, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("read", READ, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("real", REAL, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("reassign", REASSIGN, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/data/newlines_cr.data b/src/test/regress/data/newlines_cr.data
new file mode 100644
index 0000000000..5397a14fca
--- /dev/null
+++ b/src/test/regress/data/newlines_cr.data
@@ -0,0 +1 @@
+line1
line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_cr_lr.data b/src/test/regress/data/newlines_cr_lr.data
new file mode 100644
index 0000000000..8561d5d6dc
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_lr.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_cr_lr_nolast.data b/src/test/regress/data/newlines_cr_lr_nolast.data
new file mode 100644
index 0000000000..3a1bd7a527
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_lr_nolast.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_cr_nolast.data b/src/test/regress/data/newlines_cr_nolast.data
new file mode 100644
index 0000000000..d9dce6c5ea
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_nolast.data
@@ -0,0 +1 @@
+line1
line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_lr.data b/src/test/regress/data/newlines_lr.data
new file mode 100644
index 0000000000..c0d0fb45c3
--- /dev/null
+++ b/src/test/regress/data/newlines_lr.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_lr_nolast.data b/src/test/regress/data/newlines_lr_nolast.data
new file mode 100644
index 0000000000..f8be7bb828
--- /dev/null
+++ b/src/test/regress/data/newlines_lr_nolast.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_1.data b/src/test/regress/data/newlines_mixed_1.data
new file mode 100644
index 0000000000..d20e511549
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_1.data
@@ -0,0 +1 @@
+line1
line2
diff --git a/src/test/regress/data/newlines_mixed_2.data b/src/test/regress/data/newlines_mixed_2.data
new file mode 100644
index 0000000000..fe03b64cc3
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_2.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_mixed_3.data b/src/test/regress/data/newlines_mixed_3.data
new file mode 100644
index 0000000000..d2772944d6
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_3.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_4.data b/src/test/regress/data/newlines_mixed_4.data
new file mode 100644
index 0000000000..7afb2406f0
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_4.data
@@ -0,0 +1,2 @@
+line1
+line2
line3
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_5.data b/src/test/regress/data/newlines_mixed_5.data
new file mode 100644
index 0000000000..658b3593ea
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_5.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..d7ec9dd736 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,95 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ test | f
+(6 rows)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ test | f
+(6 rows)
+
+\set filename :abs_srcdir '/data/newlines_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 626a437d40..34bf06390b 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -88,8 +88,12 @@ LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
+COPY x to stdout (format RAW, delimiter ',');
+ERROR: cannot specify DELIMITER in RAW mode
COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdout (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
@@ -100,6 +104,10 @@ COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x to stdout (format TEXT, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +116,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +128,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +874,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +947,40 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+-- Test inconsistent newline style
+\set filename :abs_srcdir '/data/newlines_mixed_1.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_2.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_3.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_4.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_5.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..c106bd74ec 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,46 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 3458d287f2..56367234bf 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -71,19 +71,27 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format RAW, delimiter ',');
COPY x to stdout (format BINARY, null 'x');
+COPY x to stdout (format RAW, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +644,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +716,35 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+
+-- Test inconsistent newline style
+\set filename :abs_srcdir '/data/newlines_mixed_1.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_2.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_3.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_4.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_5.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
--
2.45.1
Hi,
Idle thoughts from a design perspective -- feel free to ignore, since
I'm not the target audience for the feature:
- If the column data stored in Postgres contains newlines, it seems
like COPY TO won't work "correctly". Is that acceptable?
- RAW seems like an okay-ish label, but for something that's doing as
much magic end-of-line detection as this patch is, I'd personally
prefer SINGLE (as in, "single column").
- Speaking of magic end-of-line detection, can there be a way to turn
that off? Say, via DELIMITER?
- Generic DELIMITER support, for any single-byte separator at all,
might make a "single-column" format more generally applicable. But I
might be over-architecting. And it would make the COPY TO issue even
worse...
Thanks,
--Jacob
On Tue, Oct 15, 2024, at 19:30, Jacob Champion wrote:
Hi,
Idle thoughts from a design perspective -- feel free to ignore, since
I'm not the target audience for the feature:
Many thanks for looking at this!
- If the column data stored in Postgres contains newlines, it seems
like COPY TO won't work "correctly". Is that acceptable?
That's an interesting edge-case to think about.
Rejecting such column data, to ensure all data that can be COPY TO'd,
can be loaded back using COPY FROM, preserving the data intact,
seems attractive because it protects users against unintentional mistakes,
and all the other COPY formats are able to preserve the data unchanged,
when doing a COPY FROM of a file created with COPY TO.
OTOH, if we think of COPY TO in this case as a way of `cat`-ing
text values, it might be more pragmatic to allow it.
With `cat`-ing I mean like if in Unix doing...
cat file1.txt file2.txt > file3.txt
...there is no way to reverse that operation,
that is, to reconstruct file1.txt and file2.txt from file3.txt.
However, I thinking rejecting such column data seems like the
better alternative, to ensure data exported with COPY TO
can always be imported back using COPY FROM,
for the same format. If text column data contains newlines,
users probably ought to be using the text or csv format instead.
- RAW seems like an okay-ish label, but for something that's doing as
much magic end-of-line detection as this patch is, I'd personally
prefer SINGLE (as in, "single column").
It's actually the same end-of-line detection as the text format
in copyfromparse.c's CopyReadLineText(), except the code
is simpler thanks to not having to deal with quotes or escapes.
It basically just learns the newline sequence based on the first
occurrence, and then require it to be the same throughout the file.
The same data files can be tested with the text format,
since they don't contain any escape, quote or delimiter characters.
Different newline styles detected automatically also by format text:
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_lr.data' (FORMAT text);
COPY 2
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_cr.data' (FORMAT text);
COPY 2
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_cr_lr.data' (FORMAT text);
COPY 2
The mixed newline style causes errors also for the text format:
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_mixed_1.data' (FORMAT text);
ERROR: literal newline found in data
HINT: Use "\n" to represent newline.
CONTEXT: COPY t, line 2
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_mixed_2.data' (FORMAT text);
ERROR: literal newline found in data
HINT: Use "\n" to represent newline.
CONTEXT: COPY t, line 2
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_mixed_3.data' (FORMAT text);
ERROR: literal carriage return found in data
HINT: Use "\r" to represent carriage return.
CONTEXT: COPY t, line 2
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_mixed_4.data' (FORMAT text);
ERROR: literal carriage return found in data
HINT: Use "\r" to represent carriage return.
CONTEXT: COPY t, line 2
COPY t FROM '/home/joel/postgresql/src/test/regress/data/newlines_mixed_5.data' (FORMAT text);
ERROR: literal carriage return found in data
HINT: Use "\r" to represent carriage return.
CONTEXT: COPY t, line 2
I must confess, I didn't know about this newline detection before reading the
copyfromparse.c source code.
- Speaking of magic end-of-line detection, can there be a way to turn
that off? Say, via DELIMITER?
- Generic DELIMITER support, for any single-byte separator at all,
might make a "single-column" format more generally applicable. But I
might be over-architecting. And it would make the COPY TO issue even
worse...
That's an interesting idea that would provide more flexibility,
though, at the cost of complicating things by overloading the meaning
of DELIMITER.
If aiming to make this more generally applicable,
then at least DELIMITER would need to be multi-byte,
since otherwise the Windows case \r\n couldn't be specified.
But I feel COPY already has quite a lot of options, and I fear it's
quite complicated for users as it is.
What I found appealing with the idea of a new COPY format,
was that instead of overloading the existing options
with more complexity, a new format wouldn't need to affect
the existing options, and the new format could be explained
separately, without making things worse for users not
using this format.
/Joel
On Tue, Oct 15, 2024 at 8:50 PM Joel Jacobson <joel@compiler.org> wrote:
Hi.
I only checked 0001, 0002, 0003.
the raw format patch is v9-0016.
003-0016 is a lot of small patches, maybe you can consolidate it to
make the review more easier.
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
0001 make sense to me, i think generally we do "to stdout", "from stdin"
v9-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patch
looks good.
typedef enum CopyLogVerbosityChoice
{
COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is
* the default, assign 0 */
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
/*
* Represents the format of the COPY operation.
*/
typedef enum CopyFormat
{
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
} CopyFormat;
BeginCopyTo
cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateData));
ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
palloc0(sizeof(CopyToStateData)); makes the default format to COPY_FORMAT_TEXT.
I think you may need COPY_FORMAT_TEXT = 0, even though based on [1]https://stackoverflow.com/questions/6434105/are-default-enum-values-in-c-the-same-for-all-compilers,
it seems not required.
[1]: https://stackoverflow.com/questions/6434105/are-default-enum-values-in-c-the-same-for-all-compilers
On Wed, Oct 16, 2024, at 05:31, jian he wrote:
Hi.
I only checked 0001, 0002, 0003.
the raw format patch is v9-0016.
003-0016 is a lot of small patches, maybe you can consolidate it to
make the review more easier.
Thanks for reviewing.
OK, I've consolidated the v9 0003-0016 into a single patch.
(I submitted them as separate smaller patches since it might be difficult
otherwise to verify the correctness of the refactoring of ProcessCopyOptions().
So if needed, the smaller patches can be viewed in the previous email.
I've only squashed them in the attached patch set, except the setting
of COPY_FORMAT_TEXT = 0, see below.)
-COPY x to stdin (format TEXT, force_quote(a)); +COPY x to stdout (format TEXT, force_quote(a)); 0001 make sense to me, i think generally we do "to stdout", "from stdin"v9-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patch
looks good.
OK, cool.
typedef enum CopyFormat
{
COPY_FORMAT_TEXT,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
} CopyFormat;
...
I think you may need COPY_FORMAT_TEXT = 0, even though based on [1],
it seems not required.
OK, changed, and I agree, since this seems to be the style in copy.h,
even if not setting = 0 seems to be more popular in the codebase in general.
/Joel
Attachments:
v10-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v10-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnu?= =?UTF-8?Q?ll-.patch?="Download
From 437a6aad8d0e84d9f706e0460ef31080128f391f Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:23:55 +0200
Subject: [PATCH 1/3] Fix thinko in tests for COPY options force_not_null and
force_null.
Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.
In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.
---
src/test/regress/expected/copy2.out | 20 ++++++++++----------
src/test/regress/sql/copy2.sql | 16 ++++++++--------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index ab449fa7b8..3f420db0bc 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -86,9 +86,9 @@ ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
@@ -96,22 +96,22 @@ COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
-COPY x to stdout (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
-COPY x to stdin (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
-COPY x to stdout (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
-COPY x to stdin (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
- ^
+LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 1aa0e41b68..5790057e1c 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,17 +70,17 @@ COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
-COPY x to stdin (format BINARY, delimiter ',');
-COPY x to stdin (format BINARY, null 'x');
+COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x to stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
-COPY x to stdout (format TEXT, force_not_null(a));
-COPY x to stdin (format CSV, force_not_null(a));
-COPY x to stdout (format TEXT, force_null(a));
-COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, on_error unsupported);
+COPY x from stdin (format TEXT, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null(a));
+COPY x from stdin (format TEXT, force_null(a));
+COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
--
2.45.1
v10-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v10-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for?= =?UTF-8?Q?-all-.patch?="Download
From f49e6d7187a2602409365f44485baf31dd66908d Mon Sep 17 00:00:00 2001
From: Joel Jakobsson <github@compiler.org>
Date: Sat, 12 Oct 2024 01:35:28 +0200
Subject: [PATCH 2/3] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.
Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.
---
src/backend/commands/copy.c | 11 +++++++----
src/test/regress/expected/copy2.out | 12 ++++++++++++
src/test/regress/sql/copy2.sql | 6 ++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b093dbb2a..e93ea3d627 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -805,12 +805,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && opts_out->force_notnull != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if (opts_out->force_notnull != NIL && !is_from)
+ if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
+ !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
@@ -819,13 +821,14 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && opts_out->force_null != NIL)
+ if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
- if (opts_out->force_null != NIL && !is_from)
+ if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 3f420db0bc..626a437d40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -98,16 +98,28 @@ LINE 1: COPY x from stdin (on_error unsupported);
^
COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
+COPY x from stdin (format CSV, force_quote *);
+ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format TEXT, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdout (format CSV, force_null *);
+ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 5790057e1c..3458d287f2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -75,11 +75,17 @@ COPY x to stdout (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
+COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
+COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
+COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
+COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
+COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
--
2.45.1
v10-0003-Add-raw-COPY-format-support-for-unstructured-text-da.patchapplication/octet-stream; name="=?UTF-8?Q?v10-0003-Add-raw-COPY-format-support-for-unstructured-text-da.?= =?UTF-8?Q?patch?="Download
From bc9c06da1b84b535193f5f56c81c4a33b4254d87 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Sat, 12 Oct 2024 08:02:49 +0200
Subject: [PATCH 3/3] Add "raw" COPY format support for unstructured text data.
This commit introduces a new format option to the COPY command,
enabling the import and export of unstructured text data where
each line is treated as a single field without any delimiters.
A new typedef enum CopyFormat is introduced, and the new
CopyFormatOptions field "format" replaces the "binary"
and "csv_mode" fields.
This commit also refactors ProcessCopyOptions(), separating
the validation of all options into their own sections.
It moves the binary format validations under respective
option checks.
For clarity, explicitly check `on_error != COPY_ON_ERROR_IGNORE`
instead of `!on_error`.
Also update comment for the section of code at the end
of ProcessCopyOptions() that now is dedicated to additional
checks for interdependent options.
---
doc/src/sgml/ref/copy.sgml | 98 +++-
src/backend/commands/copy.c | 485 +++++++++++-------
src/backend/commands/copyfrom.c | 17 +-
src/backend/commands/copyfromparse.c | 238 ++++++++-
src/backend/commands/copyto.c | 88 +++-
src/backend/parser/gram.y | 8 +-
src/include/commands/copy.h | 14 +-
src/include/parser/kwlist.h | 1 +
src/test/regress/data/newlines_cr.data | 1 +
src/test/regress/data/newlines_cr_lr.data | 2 +
.../regress/data/newlines_cr_lr_nolast.data | 2 +
src/test/regress/data/newlines_cr_nolast.data | 1 +
src/test/regress/data/newlines_lr.data | 2 +
src/test/regress/data/newlines_lr_nolast.data | 2 +
src/test/regress/data/newlines_mixed_1.data | 1 +
src/test/regress/data/newlines_mixed_2.data | 2 +
src/test/regress/data/newlines_mixed_3.data | 2 +
src/test/regress/data/newlines_mixed_4.data | 2 +
src/test/regress/data/newlines_mixed_5.data | 2 +
src/test/regress/expected/copy.out | 92 ++++
src/test/regress/expected/copy2.out | 57 +-
src/test/regress/sql/copy.sql | 43 ++
src/test/regress/sql/copy2.sql | 43 +-
src/tools/pgindent/typedefs.list | 1 +
24 files changed, 958 insertions(+), 246 deletions(-)
create mode 100644 src/test/regress/data/newlines_cr.data
create mode 100644 src/test/regress/data/newlines_cr_lr.data
create mode 100644 src/test/regress/data/newlines_cr_lr_nolast.data
create mode 100644 src/test/regress/data/newlines_cr_nolast.data
create mode 100644 src/test/regress/data/newlines_lr.data
create mode 100644 src/test/regress/data/newlines_lr_nolast.data
create mode 100644 src/test/regress/data/newlines_mixed_1.data
create mode 100644 src/test/regress/data/newlines_mixed_2.data
create mode 100644 src/test/regress/data/newlines_mixed_3.data
create mode 100644 src/test/regress/data/newlines_mixed_4.data
create mode 100644 src/test/regress/data/newlines_mixed_5.data
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..06ca632ee3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,88 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a <literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e93ea3d627..74d6ebb78d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,13 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -672,62 +674,150 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these three before inserting
- * defaults)
+ * Set default format if not specified.
+ * This isn't strictly necessary since COPY_FORMAT_TEXT is 0 and
+ * opts_out is palloc0'd, but do it for clarity.
*/
- if (opts_out->binary && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->binary && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
- if (opts_out->binary && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
-
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
-
- if (opts_out->csv_mode)
+ if (!format_specified)
+ opts_out->format = COPY_FORMAT_TEXT;
+
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "DELIMITER")));
+
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
+
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
+
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Set default quote */
+ opts_out->quote = "\"";
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -735,135 +825,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
- }
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (!opts_out->csv_mode &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
- /* Check header */
- if (opts_out->binary && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
- /* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
- /* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
- /* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
- /* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
-
- /* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
-
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
-
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -881,7 +843,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -897,19 +859,154 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
+
+ /* --- HEADER option --- */
+ if (header_specified)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
@@ -1583,7 +1590,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1641,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1782,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..2528c6f111 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -163,7 +165,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,8 +750,8 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ /* only available for text, csv, or raw input */
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,10 +768,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -821,10 +830,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -865,7 +881,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +922,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1096,7 +1112,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1179,7 +1198,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1275,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1314,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1342,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1359,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1370,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1390,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
@@ -1462,6 +1481,138 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+
+ /*
+ * The objective of this loop is to transfer the entire next input line
+ * into line_buf. We only care for detecting newlines (\r and/or \n).
+ * All other characters are treated as regular data.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * For a little extra speed within the loop, we copy input_buf and
+ * input_buf_len into local variables.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+ char c;
+
+ /*
+ * Load more data if needed.
+ */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* OK to fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+ c = copy_input_buf[input_buf_ptr++];
+
+ /* Process \r */
+ if (c == '\r')
+ {
+ /* Check for \r\n on first line, _and_ handle \r\n. */
+ if (cstate->eol_type == EOL_UNKNOWN ||
+ cstate->eol_type == EOL_CRNL)
+ {
+ /*
+ * If need more data, go back to loop top to load it.
+ *
+ * Note that if we are at EOF, c will wind up as '\0' because
+ * of the guaranteed pad of input_buf.
+ */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+ /* get next char */
+ c = copy_input_buf[input_buf_ptr];
+
+ if (c == '\n')
+ {
+ input_buf_ptr++; /* eat newline */
+ cstate->eol_type = EOL_CRNL; /* in case not set yet */
+ }
+ else
+ {
+ if (cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /*
+ * if we got here, it is the first line and we didn't find
+ * \n, so don't consume the peeked character
+ */
+ cstate->eol_type = EOL_CR;
+ }
+ }
+ else if (cstate->eol_type == EOL_NL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* Process \n */
+ if (c == '\n')
+ {
+ if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("inconsistent newline style")));
+ cstate->eol_type = EOL_NL; /* in case not set yet */
+ /* If reach here, we have found the line terminator */
+ break;
+ }
+
+ /* All other characters are treated as regular data */
+ } /* end of outer loop */
+
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2089,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' requires exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..99fd68a483 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -134,7 +135,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +192,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +237,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -570,6 +571,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -771,7 +779,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +800,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,10 +841,12 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -880,7 +890,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +918,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +927,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -937,7 +948,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
@@ -945,7 +956,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +976,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1261,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4aa8646af7..0d0a3ad7ff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -768,7 +768,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
QUOTE QUOTES
- RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
+ RANGE RAW READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
@@ -3513,6 +3513,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | RAW
+ {
+ $$ = makeDefElem("format", (Node *) makeString("raw"), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17771,6 +17775,7 @@ unreserved_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REASSIGN
| RECURSIVE
@@ -18398,6 +18403,7 @@ bare_label_keyword:
| QUOTE
| QUOTES
| RANGE
+ | RAW
| READ
| REAL
| REASSIGN
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..37c44fa1bc 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,17 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +72,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 899d64ad55..02cd28c750 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -360,6 +360,7 @@ PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("quotes", QUOTES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("raw", RAW, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("read", READ, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("real", REAL, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("reassign", REASSIGN, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/data/newlines_cr.data b/src/test/regress/data/newlines_cr.data
new file mode 100644
index 0000000000..5397a14fca
--- /dev/null
+++ b/src/test/regress/data/newlines_cr.data
@@ -0,0 +1 @@
+line1
line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_cr_lr.data b/src/test/regress/data/newlines_cr_lr.data
new file mode 100644
index 0000000000..8561d5d6dc
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_lr.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_cr_lr_nolast.data b/src/test/regress/data/newlines_cr_lr_nolast.data
new file mode 100644
index 0000000000..3a1bd7a527
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_lr_nolast.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_cr_nolast.data b/src/test/regress/data/newlines_cr_nolast.data
new file mode 100644
index 0000000000..d9dce6c5ea
--- /dev/null
+++ b/src/test/regress/data/newlines_cr_nolast.data
@@ -0,0 +1 @@
+line1
line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_lr.data b/src/test/regress/data/newlines_lr.data
new file mode 100644
index 0000000000..c0d0fb45c3
--- /dev/null
+++ b/src/test/regress/data/newlines_lr.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_lr_nolast.data b/src/test/regress/data/newlines_lr_nolast.data
new file mode 100644
index 0000000000..f8be7bb828
--- /dev/null
+++ b/src/test/regress/data/newlines_lr_nolast.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_1.data b/src/test/regress/data/newlines_mixed_1.data
new file mode 100644
index 0000000000..d20e511549
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_1.data
@@ -0,0 +1 @@
+line1
line2
diff --git a/src/test/regress/data/newlines_mixed_2.data b/src/test/regress/data/newlines_mixed_2.data
new file mode 100644
index 0000000000..fe03b64cc3
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_2.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/data/newlines_mixed_3.data b/src/test/regress/data/newlines_mixed_3.data
new file mode 100644
index 0000000000..d2772944d6
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_3.data
@@ -0,0 +1,2 @@
+line1
+line2
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_4.data b/src/test/regress/data/newlines_mixed_4.data
new file mode 100644
index 0000000000..7afb2406f0
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_4.data
@@ -0,0 +1,2 @@
+line1
+line2
line3
\ No newline at end of file
diff --git a/src/test/regress/data/newlines_mixed_5.data b/src/test/regress/data/newlines_mixed_5.data
new file mode 100644
index 0000000000..658b3593ea
--- /dev/null
+++ b/src/test/regress/data/newlines_mixed_5.data
@@ -0,0 +1,2 @@
+line1
+line2
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..d7ec9dd736 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,95 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ test | f
+(6 rows)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+------+----------
+ ",\ | f
+ \. | f
+ | f
+ | f
+ | f
+ test | f
+(6 rows)
+
+\set filename :abs_srcdir '/data/newlines_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
+\set filename :abs_srcdir '/data/newlines_cr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+ col | ?column?
+-------+----------
+ line1 | f
+ line2 | f
+(2 rows)
+
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 626a437d40..34bf06390b 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -88,8 +88,12 @@ LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
+COPY x to stdout (format RAW, delimiter ',');
+ERROR: cannot specify DELIMITER in RAW mode
COPY x to stdout (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdout (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
@@ -100,6 +104,10 @@ COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x to stdout (format TEXT, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +116,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +128,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +874,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +947,40 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+-- Test inconsistent newline style
+\set filename :abs_srcdir '/data/newlines_mixed_1.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_2.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_3.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_4.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
+\set filename :abs_srcdir '/data/newlines_mixed_5.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+ERROR: inconsistent newline style
+CONTEXT: COPY copy_raw_test_errors, line 2
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..c106bd74ec 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,46 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_builddir '/results/copy_raw_test.data'
+CREATE TABLE copy_raw_test (id SERIAL PRIMARY KEY, col text);
+INSERT INTO copy_raw_test (col) VALUES
+(E'",\\'), (E'\\.'), (NULL), (''), (' '), ('test');
+COPY copy_raw_test (col) TO :'filename' (FORMAT raw);
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' RAW;
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_lr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_lr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
+
+\set filename :abs_srcdir '/data/newlines_cr_nolast.data'
+TRUNCATE copy_raw_test;
+COPY copy_raw_test (col) FROM :'filename' (FORMAT raw);
+SELECT col, col IS NULL FROM copy_raw_test ORDER BY id;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 3458d287f2..56367234bf 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -71,19 +71,27 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x to stdout (format BINARY, delimiter ',');
+COPY x to stdout (format RAW, delimiter ',');
COPY x to stdout (format BINARY, null 'x');
+COPY x to stdout (format RAW, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
COPY x to stdout (format TEXT, force_quote(a));
COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +644,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +716,35 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+
+-- Test inconsistent newline style
+\set filename :abs_srcdir '/data/newlines_mixed_1.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_2.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_3.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_4.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
+
+\set filename :abs_srcdir '/data/newlines_mixed_5.data'
+COPY copy_raw_test_errors (col1) FROM :'filename' (FORMAT raw);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..59433d120e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
On Tue, Oct 15, 2024 at 1:38 PM Joel Jacobson <joel@compiler.org> wrote:
However, I thinking rejecting such column data seems like the
better alternative, to ensure data exported with COPY TO
can always be imported back using COPY FROM,
for the same format. If text column data contains newlines,
users probably ought to be using the text or csv format instead.
Yeah. I think _someone's_ going to have strong opinions one way or the
other, but that person is not me. And I assume a contents check during
COPY TO is going to have a noticeable performance impact...
- RAW seems like an okay-ish label, but for something that's doing as
much magic end-of-line detection as this patch is, I'd personally
prefer SINGLE (as in, "single column").It's actually the same end-of-line detection as the text format
in copyfromparse.c's CopyReadLineText(), except the code
is simpler thanks to not having to deal with quotes or escapes.
Right, sorry, I hadn't meant to imply that you made it up. :D Just
that a "raw" format that is actually automagically detecting things
doesn't seem very "raw" to me, so I prefer the other name.
It basically just learns the newline sequence based on the first
occurrence, and then require it to be the same throughout the file.
A hypothetical type whose text representation can contain '\r' but not
'\n' still can't be unambiguously round-tripped under this scheme:
COPY FROM will see the "mixed" line endings and complain, even though
there's no ambiguity.
Maybe no one will run into that problem in practice? But if they did,
I think that'd be a pretty frustrating limitation. It'd be nice to
override the behavior, to change it from "do what you think I mean" to
"do what I say".
- Speaking of magic end-of-line detection, can there be a way to turn
that off? Say, via DELIMITER?
- Generic DELIMITER support, for any single-byte separator at all,
might make a "single-column" format more generally applicable. But I
might be over-architecting. And it would make the COPY TO issue even
worse...That's an interesting idea that would provide more flexibility,
though, at the cost of complicating things by overloading the meaning
of DELIMITER.
I think that'd be a docs issue rather than a conceptual one, though...
it's still a delimiter. I wouldn't really expect end-user confusion.
If aiming to make this more generally applicable,
then at least DELIMITER would need to be multi-byte,
since otherwise the Windows case \r\n couldn't be specified.
True.
What I found appealing with the idea of a new COPY format,
was that instead of overloading the existing options
with more complexity, a new format wouldn't need to affect
the existing options, and the new format could be explained
separately, without making things worse for users not
using this format.
I agree that we should not touch the existing formats. If
RAW/SINGLE/whatever needed a multibyte line delimiter, I'm not
proposing that the other formats should change.
--Jacob
Joel Jacobson wrote:
However, I thinking rejecting such column data seems like the
better alternative, to ensure data exported with COPY TO
can always be imported back using COPY FROM,
for the same format.
On the other hand, that might prevent cases where we
want to export, for instance, a valid json array:
copy (select json_agg(col) from table ) to 'file' RAW
This is a variant of the discussion in [1]/messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com where the OP does:
copy (select json_agg(row_to_json(t)) from <query> t) TO 'file'
and he complains that both text and csv "break the JSON".
That discussion morphed into a proposed patch adding JSON
format to COPY, but RAW would work directly as the OP
expected.
That is, unless <query> happens to include JSON fields with LF/CRLF
in them, and the RAW format says this is an error condition.
In that case it's quite annoying to make it an error, rather than
simply let it pass.
[1]: /messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com
/messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
On Wed, Oct 16, 2024, at 18:04, Jacob Champion wrote:
A hypothetical type whose text representation can contain '\r' but not
'\n' still can't be unambiguously round-tripped under this scheme:
COPY FROM will see the "mixed" line endings and complain, even though
there's no ambiguity.
Yeah, that's quite an ugly limitation.
Maybe no one will run into that problem in practice? But if they did,
I think that'd be a pretty frustrating limitation. It'd be nice to
override the behavior, to change it from "do what you think I mean" to
"do what I say".
That would be nice.
That's an interesting idea that would provide more flexibility,
though, at the cost of complicating things by overloading the meaning
of DELIMITER.I think that'd be a docs issue rather than a conceptual one, though...
it's still a delimiter. I wouldn't really expect end-user confusion.
Yeah, I meant the docs, but that's probably fine,
we could just add <note> to DELIMITER.
What I found appealing with the idea of a new COPY format,
was that instead of overloading the existing options
with more complexity, a new format wouldn't need to affect
the existing options, and the new format could be explained
separately, without making things worse for users not
using this format.I agree that we should not touch the existing formats. If
RAW/SINGLE/whatever needed a multibyte line delimiter, I'm not
proposing that the other formats should change.
Right, I didn't think you did either, I meant overloading the existing
options, from a docs perspective.
But I agree it's probably fine if we just overload DELIMITER in the docs,
that should be possible to explain in a pedagogic way,
without causing confusion.
/Joel
On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote:
Joel Jacobson wrote:
However, I thinking rejecting such column data seems like the
better alternative, to ensure data exported with COPY TO
can always be imported back using COPY FROM,
for the same format.On the other hand, that might prevent cases where we
want to export, for instance, a valid json array:copy (select json_agg(col) from table ) to 'file' RAW
This is a variant of the discussion in [1] where the OP does:
copy (select json_agg(row_to_json(t)) from <query> t) TO 'file'
and he complains that both text and csv "break the JSON".
That discussion morphed into a proposed patch adding JSON
format to COPY, but RAW would work directly as the OP
expected.That is, unless <query> happens to include JSON fields with LF/CRLF
in them, and the RAW format says this is an error condition.
In that case it's quite annoying to make it an error, rather than
simply let it pass.[1]
/messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com
Thanks for finding this related thread.
Very good example, that would be solved with RAW.
I've used a different hack myself many times to export text "as is",
which is to use COPY TO ... BINARY, and then to manually edit the
file using vim, stripping away the header and footer from the file. :D
But I don't see how JSON fields that come from PostgreSQL
could contain LF/CRLF though, could they?
Since LF/CR must be escaped inside a JSON field,
and when casting JSONB to text, there are no newlines
injected anywhere in between fields.
I can only see how a value of the legacy JSON type could
have newlines in it.
That doesn't matter though, this is still a very good example
on the need to export text "as is" from PostgreSQL to a file.
I like Jacob's idea of letting the user specify a DELIMITER
also for the RAW format, to override the automagical
newline detection. This would e.g. allow importing values
containing e.g. \r when the newline delimiter is \n,
which would otherwise be reject with an
"inconsistent newline style" error.
I think it would also be useful to allow specifying
DELIMITER NONE, which would allow importing an
entire text file, "as is", into a single column and single row.
To export a single column single row, the delimiter
wouldn't matter, since there wouldn't be any.
But DELIMITER NONE would be useful also for COPY TO,
if wanting to concatenate text "as is" without adding
newlines in between, to a file, similar to the
UNIX "cat" command.
Regarding the name for the format, I thought SINGLE
was nice before I read this message, but now since
the need for more rawness seems desirable,
I think I like RAW the most again, since if the
automagical newline detection can be overridden,
then it's really raw for real.
A final thought is to maybe consider just skipping
the automagical newline detection for RAW?
Instead of the automagical detection,
the default newline delimiter could be the OS default,
similar to how COPY TO works.
That way, it would almost always just work for most users,
as long as processing files within their OS,
and when not, they would just need to specify the DELIMITER.
/Joel
On Wed, Oct 16, 2024, at 20:30, Joel Jacobson wrote:
A final thought is to maybe consider just skipping
the automagical newline detection for RAW?Instead of the automagical detection,
the default newline delimiter could be the OS default,
similar to how COPY TO works.That way, it would almost always just work for most users,
as long as processing files within their OS,
and when not, they would just need to specify the DELIMITER.
I would guess that nowadays, dealing with unstructured text files
are probably less common, than dealing with structured text files,
such as JSON, YAML, TOML, XML, etc.
Therefore, maybe DELIMITER NONE would be a better default
for RAW? Especially since it's then also more honest in being "raw".
If needing to import an unstructured text file that is just newline
delimited, and not wanting the entire file as a single value,
the newline style would then just need to be specified
using the DELIMITER option.
/Joel
On Wed, Oct 16, 2024, at 21:13, Joel Jacobson wrote:
Therefore, maybe DELIMITER NONE would be a better default
for RAW? Especially since it's then also more honest in being "raw".If needing to import an unstructured text file that is just newline
delimited, and not wanting the entire file as a single value,
the newline style would then just need to be specified
using the DELIMITER option.
I realize the DELIMITER NONE syntax is unnecessary,
since if that's the default for RAW, we would just not specify any delimiter.
/Joel
On Wed, Oct 16, 2024 at 2:37 PM Joel Jacobson <joel@compiler.org> wrote:
On Wed, Oct 16, 2024, at 05:31, jian he wrote:
Hi.
I only checked 0001, 0002, 0003.
the raw format patch is v9-0016.
003-0016 is a lot of small patches, maybe you can consolidate it to
make the review more easier.Thanks for reviewing.
OK, I've consolidated the v9 0003-0016 into a single patch.
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified
<literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client
encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a
<literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and
<literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
+ <refsect2>
+ <title>Raw Format</title>
+
+ <para>
+ This format option is used for importing and exporting files containing
+ unstructured text, where each line is treated as a single field. It is
+ ideal for data that does not conform to a structured, tabular format and
+ lacks delimiters.
+ </para>
+
+ <para>
+ In the <literal>raw</literal> format, each line of the input or output is
+ considered a complete value without any field separation. There are no
+ field delimiters, and all characters are taken literally. There is no
+ special handling for quotes, backslashes, or escape sequences. All
+ characters, including whitespace and special characters, are preserved
+ exactly as they appear in the file. However, it's important to note that
+ the text is still interpreted according to the specified
<literal>ENCODING</literal>
+ option or the current client encoding for input, and encoded using the
+ specified <literal>ENCODING</literal> or the current client
encoding for output.
+ </para>
+
+ <para>
+ When using this format, the <command>COPY</command> command must specify
+ exactly one column. Specifying multiple columns will result in an error.
+ If the table has multiple columns and no column list is provided, an error
+ will occur.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not distinguish a
<literal>NULL</literal>
+ value from an empty string. Empty lines are imported as empty strings, not
+ as <literal>NULL</literal> values.
+ </para>
+
+ <para>
+ Encoding works the same as in the <literal>text</literal> and
<literal>CSV</literal> formats.
+ </para>
+
+ </refsect2>
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
<refsect2> <title>Raw Format</title> is duplicated
<title>Raw Format</title> didn't mention the special handling of
end-of-data marker.
+COPY copy_raw_test (col) FROM :'filename' RAW;
we may need to support this.
since we not allow
COPY x from stdin text;
COPY x to stdout text;
so I think adding the RAW keyword in gram.y may not be necessary.
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny,
"WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
src/bin/psql/tab-complete.in.c, we can also add "raw".
/* --- ESCAPE option --- */
if (opts_out->escape)
{
if (opts_out->format != COPY_FORMAT_CSV)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
}
escape option no regress test.
/* --- QUOTE option --- */
if (opts_out->quote)
{
if (opts_out->format != COPY_FORMAT_CSV)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
}
escape option no regress test.
CopyOneRowTo
else if (cstate->opts.format == COPY_FORMAT_RAW)
{
int attnum;
Datum value;
bool isnull;
/* Ensure only one column is being copied */
if (list_length(cstate->attnumlist) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY with format 'raw' must specify
exactly one column")));
attnum = linitial_int(cstate->attnumlist);
value = slot->tts_values[attnum - 1];
isnull = slot->tts_isnull[attnum - 1];
if (!isnull)
{
char *string = OutputFunctionCall(&out_functions[attnum - 1],
value);
CopyAttributeOutRaw(cstate, string);
}
/* For RAW format, we don't send anything for NULL values */
}
We already did column length checking at BeginCopyTo.
no need to "if (list_length(cstate->attnumlist) != 1)" error check in
CopyOneRowTo?
On Fri, Oct 18, 2024, at 15:52, jian he wrote:
<refsect2> <title>Raw Format</title> is duplicated
<title>Raw Format</title> didn't mention the special handling of
end-of-data marker.
Thanks for reviewing, above fixed.
Here is a summary of the changes since v10, thanks to the feedback:
Handling of e.g. JSON and other structured text files that could contain
newlines, in a seamless way seems important, so therefore the default is
no delimiter for the raw format, so that the entire input is read as one data
value for COPY FROM, and all column data is concatenated without delimiter
for COPY TO.
When specifying a delimiter for the raw format, it separates *rows*, and can be
a multi-byte string, such as E'\r\n' to handle Windows text files.
This has been documented under the DELIMITER option, as well as under the
Raw Format section.
This also means that HEADER cannot be supported for RAW, since where there is
no delimiter, there would be no way to tell when the header line ends.
For flexibility when exporting data, there is no restriction on what characters
the column data can contain, which has been documented in this way:
When using COPY TO with raw format and a specified DELIMITER, there is no check
to prevent data values from containing the delimiter string, which could be
problematic if it would be needed to import the data preserved using COPY FROM,
since a data value containing the delimiter would then be split into two values.
If this is a concern, a different format should be used instead.
The refactoring is now in a separate first single commit, which seems
necessary, to separate the new functionality, from the refactoring.
Here are two examples on usage:
1. Example of importing/exporting entire JSON file
% cat test.json
[
{
"id" : 1,
"t_test" : "here's a \"string\""
}
]
% psql
CREATE TABLE t (c jsonb);
\COPY t FROM test.json (FORMAT raw);
SELECT * FROM t;
c
----------------------------------------------
[{"id": 1, "t_test": "here's a \"string\""}]
(1 row)
\COPY t TO test.json (FORMAT raw);
% cat test.json
[{"id": 1, "t_test": "here's a \"string\""}]%
Note: the ending "%" is just the terminal indicating there is no newline at the end,
which is intended, since there is no delimiter specified.
2. Example of importing/exporting JSONL (newline-delimited JSON)
% cat log.jsonl
{"timestamp": "2024-10-17T09:15:30Z", "level": "INFO"}
{"timestamp": "2024-10-17T09:16:10Z", "level": "ERROR"}
{"timestamp": "2024-10-17T09:17:45Z", "level": "WARNING"}
% psql
\COPY t FROM log.jsonl (FORMAT raw, DELIMITER E'\n');
SELECT * FROM t;
c
-----------------------------------------------------------------------------------------------------
{"level": "INFO", "timestamp": "2024-10-17T09:15:30Z"}
{"level": "ERROR", "timestamp": "2024-10-17T09:16:10Z"}
{"level": "WARNING", "timestamp": "2024-10-17T09:17:45Z"}
(3 rows)
\COPY t TO log.jsonl (FORMAT raw, DELIMITER E'\n');
% cat log.jsonl
{"level": "INFO", "timestamp": "2024-10-17T09:15:30Z"}
{"level": "ERROR", "timestamp": "2024-10-17T09:16:10Z"}
{"level": "WARNING", "timestamp": "2024-10-17T09:17:45Z"}
+COPY copy_raw_test (col) FROM :'filename' RAW;
we may need to support this.
since we not allow
COPY x from stdin text;
COPY x to stdout text;
so I think adding the RAW keyword in gram.y may not be necessary.
Nice, I didn't know text was not supported either,
so then it seems fine to only support the new syntax.
RAW keyword and the grammar removed.
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny,
"WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
src/bin/psql/tab-complete.in.c, we can also add "raw".
Nice, added.
escape option no regress test.
Regress tests added for both ESCAPE and QUOTE, checking that they cannot be used
with TEXT, RAW nor BINARY.
We already did column length checking at BeginCopyTo.
no need to "if (list_length(cstate->attnumlist) != 1)" error check in
CopyOneRowTo?
Hmm, not sure really, since DoCopy() calls both BeginCopyTo()
and DoCopyTo() which in turn calls CopyOneRowTo(),
but CopyOneRowTo() is also being called from copy_dest_receive().
/Joel
Attachments:
v11-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patchapplication/octet-stream; name="=?UTF-8?Q?v11-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.?= =?UTF-8?Q?patch?="Download
From efc5098742954cfe7a39721a08c845638171d7db Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 17 Oct 2024 09:00:22 +0200
Subject: [PATCH 1/2] Refactor ProcessCopyOptions: introduce CopyFormat enum;
reorganize validation.
Replace the 'binary' and 'csv_mode' boolean fields in CopyFormatOptions with a
new enum CopyFormat, which explicitly represents the COPY format as
COPY_FORMAT_TEXT, COPY_FORMAT_BINARY, or COPY_FORMAT_CSV. This clarifies the
code by directly representing the format, simplifying checks and making the
logic more transparent.
Reorganize the option validation in ProcessCopyOptions by separating validation
checks into their own sections based on the options being validated. This
enhances readability by grouping related validations together, making the code
easier to follow and maintain.
Minor style edits:
* Using `!= NIL` to check if `opts_out->force_quote` is empty,
to match the existing checks for `opts_out->force_notnull`
and `opts_out->force_null`.
* Use `opts_out->on_error != COPY_ON_ERROR_STOP`,
instead of `!opts_out->on_error`,
to improve readability by conveying that on_error is an enum,
and that the branch is taken if ON_ERROR is not "stop".
No behavioral changes are intended; this is a pure refactoring to improve code
clarity and maintainability.
---
src/backend/commands/copy.c | 447 ++++++++++++++++-----------
src/backend/commands/copyfrom.c | 10 +-
src/backend/commands/copyfromparse.c | 34 +-
src/backend/commands/copyto.c | 20 +-
src/include/commands/copy.h | 13 +-
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 302 insertions(+), 223 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..a5cde15724 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -671,63 +671,126 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->binary && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (opts_out->binary && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->csv_mode)
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
}
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -735,136 +798,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
- }
-
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (!opts_out->csv_mode &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
- /* Check header */
- if (opts_out->binary && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
- /* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
- /* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
- /* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
- /* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
- /* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
-
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
-
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -882,7 +816,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,19 +832,154 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ /* --- HEADER option --- */
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..c3d1df267f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..59433d120e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v11-0002-Add-raw-format-to-COPY-command.patchapplication/octet-stream; name="=?UTF-8?Q?v11-0002-Add-raw-format-to-COPY-command.patch?="Download
From 160af36a258a6125a4213e3a267ce5d239a93c29 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 03:03:09 +0200
Subject: [PATCH 2/2] Add raw format to COPY command.
This commit introduces a new raw format to the COPY command, enabling
efficient bulk data transfer of a single text column without any parsing,
quoting, or escaping. In raw format, data is copied exactly as it appears
in the file or table, adhering to the specified ENCODING option or the
current client encoding.
The raw format enforces a single column requirement, ensuring that exactly
one column is specified in the column list. Attempts to specify multiple
columns or omit the column list when the table has multiple columns will
result in an error. Additionally, the DELIMITER option in raw format accepts
any string, including multi-byte characters, providing greater flexibility
in defining data separators. If no DELIMITER is specified, the entire input
or output is treated as a single data value.
Furthermore, the raw format does not support format-specific options such as
NULL, HEADER, QUOTE, ESCAPE, FORCE_QUOTE, FORCE_NOT_NULL, and FORCE_NULL.
Using these options with the raw format will trigger errors, ensuring that
data remains unaltered during the transfer process.
This enhancement is particularly useful when handling text blobs, JSON files,
or other text-based formats where preserving the data "as is" is crucial.
---
doc/src/sgml/ref/copy.sgml | 134 ++++++++++++++--
src/backend/commands/copy.c | 105 ++++++++-----
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 188 ++++++++++++++++++++++-
src/backend/commands/copyto.c | 92 ++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 1 +
src/test/regress/expected/copy.out | 52 +++++++
src/test/regress/expected/copy2.out | 52 ++++++-
src/test/regress/sql/copy.sql | 24 +++
src/test/regress/sql/copy2.sql | 37 ++++-
12 files changed, 625 insertions(+), 72 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..f17d606537 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -253,11 +254,27 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>DELIMITER</literal></term>
<listitem>
<para>
- Specifies the character that separates columns within each row
- (line) of the file. The default is a tab character in text format,
- a comma in <literal>CSV</literal> format.
- This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ Specifies the delimiter used in the file. Its usage depends on the
+ <literal>FORMAT</literal> specified:
+ <simplelist>
+ <member>
+ In <literal>text</literal> and <literal>CSV</literal> formats,
+ the delimiter separates <emphasis>columns</emphasis> within each row
+ (line) of the file.
+ The default is a tab character in <literal>text</literal> format and
+ a comma in <literal>CSV</literal> format. This must be a single
+ one-byte character.
+ </member>
+ <member>
+ In <literal>raw</literal> format, the delimiter separates
+ <emphasis>rows</emphasis> in the file. The default is no delimiter,
+ which means that for <command>COPY FROM</command>, the entire input is
+ read as a single field, and for <command>COPY TO</command>, the output
+ is concatenated without any delimiter. If a delimiter is specified,
+ it can be a multi-byte string; for example, <literal>E'\r\n'</literal>
+ can be used when dealing with text files on Windows platforms.
+ </member>
+ </simplelist>
</para>
</listitem>
</varlistentry>
@@ -271,7 +288,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +312,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -310,7 +328,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
The <literal>MATCH</literal> option is only valid for <command>COPY
FROM</command> commands.
</para>
@@ -400,7 +419,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +913,98 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2 id="sql-copy-raw-format" xreflabel="Raw Format">
+ <title>Raw Format</title>
+
+ <para>
+ The <literal>raw</literal> format is designed for efficient bulk data
+ transfer of a single text column without any parsing, quoting, or
+ escaping. In this format, data is copied exactly as it appears in the file
+ or table, interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding.
+ </para>
+
+ <para>
+ When using the <literal>raw</literal> format, each data value corresponds
+ to a single field with no additional formatting or processing. The
+ <literal>DELIMITER</literal> option specifies the string that separates
+ data values. Unlike in other formats, the delimiter in
+ <literal>raw</literal> format can be any string, including multi-byte
+ characters. If no <literal>DELIMITER</literal> is specified, the entire
+ input or output is treated as a single data value.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format requires that exactly one column be
+ specified in the column list. An error is raised if more than one column
+ is specified or if no column list is specified when the table has multiple
+ columns.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not support any of the
+ format-specific options of other formats, such as <literal>NULL</literal>,
+ <literal>HEADER</literal>, <literal>QUOTE</literal>,
+ <literal>ESCAPE</literal>, <literal>FORCE_QUOTE</literal>,
+ <literal>FORCE_NOT_NULL</literal>, and <literal>FORCE_NULL</literal>.
+ Attempting to use these options with <literal>raw</literal> format will
+ result in an error.
+ </para>
+
+ <para>
+ Since the <literal>raw</literal> format deals with text, the data is
+ interpreted according to the specified <literal>ENCODING</literal> option
+ or the current client encoding for input, and encoded using the specified
+ <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <note>
+ <para>
+ Empty lines in the input are treated as empty strings, not as
+ <literal>NULL</literal> values. There is no way to represent a
+ <literal>NULL</literal> value in <literal>raw</literal> format.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format is particularly useful when you need to
+ import or export data exactly as it appears. This can be
+ helpful when dealing with large text blobs, JSON files, or other
+ text-based formats.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format can only be used when copying exactly
+ one column. If the table has multiple columns, you must specify the
+ column list containing only one column.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ Unlike other formats, the delimiter in <literal>raw</literal> format can
+ be any string, and there are no restrictions on the characters used in
+ the delimiter, including newline or carriage return characters.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ When using <literal>COPY TO</literal> with <literal>raw</literal> format
+ and a specified <literal>DELIMITER</literal>, there is no check to prevent
+ data values from containing the delimiter string, which could be
+ problematic if it would be needed to import the data preserved using
+ <literal>COPY FROM</literal>, since a data value containing the delimiter
+ would then be split into two values. If this is a concern, a different
+ format should be used instead.
+ </para>
+ </note>
+ </refsect2>
+
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a5cde15724..6bff50127c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -680,41 +682,47 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+ }
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
- }
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ if (opts_out->format == COPY_FORMAT_TEXT)
+ {
+ /*
+ * Disallow unsafe delimiter characters in text mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
}
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -724,6 +732,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -731,11 +744,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -787,6 +801,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->null_print);
@@ -845,6 +864,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
}
else
{
@@ -933,8 +958,8 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
@@ -977,7 +1002,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
+ errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..d898fce2c2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +830,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1110,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1147,6 +1164,21 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf.len -= 2;
cstate->line_buf.data[cstate->line_buf.len] = '\0';
break;
+ case EOL_CUSTOM:
+ {
+ int delim_len;
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+ Assert(cstate->opts.delim);
+ delim_len = strlen(cstate->opts.delim);
+ Assert(delim_len > 0);
+ Assert(cstate->line_buf.len >= delim_len);
+ Assert(memcmp(cstate->line_buf.data + cstate->line_buf.len - delim_len,
+ cstate->opts.delim,
+ delim_len) == 0);
+ cstate->line_buf.len -= delim_len;
+ cstate->line_buf.data[cstate->line_buf.len] = '\0';
+ }
+ break;
case EOL_UNKNOWN:
/* shouldn't get here */
Assert(false);
@@ -1462,6 +1494,109 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+ bool read_entire_file = (cstate->opts.delim == NULL);
+ int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;
+
+ /*
+ * The objective of this loop is to transfer data into line_buf until we
+ * find the specified delimiter or reach EOF. In raw format, we treat the
+ * input data as-is, without any parsing, quoting, or escaping. We are
+ * only interested in locating the delimiter to determine the boundaries
+ * of each data value.
+ *
+ * If a delimiter is specified, we read data until we encounter the
+ * delimiter string. If no delimiter is specified, we read the entire
+ * input as a single data value. Unlike text or CSV modes, we do not need
+ * to handle line endings, escape sequences, or special characters.
+ *
+ * The input has already been converted to the database encoding. All
+ * supported server encodings have the property that all bytes in a
+ * multi-byte sequence have the high bit set, so a multibyte character
+ * cannot contain any newline or escape characters embedded in the
+ * multibyte sequence. Therefore, we can process the input byte-by-byte,
+ * regardless of the encoding.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * We handle both single-byte and multi-byte delimiters. For multi-byte
+ * delimiters, we ensure that we have enough data in the buffer to compare
+ * the delimiter string.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+
+ /* Load more data if needed */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* Update local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /* If no more data, break out of the loop */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* Fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+
+ if (read_entire_file)
+ {
+ /* Continue until EOF if reading entire file */
+ input_buf_ptr++;
+ continue;
+ }
+ else
+ {
+ /* Check for delimiter, possibly multi-byte */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+ if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim,
+ delim_len) == 0)
+ {
+ cstate->eol_type = EOL_CUSTOM;
+ input_buf_ptr += delim_len;
+ break;
+ }
+ input_buf_ptr++;
+ }
+ }
+
+ /* Transfer data to line_buf, including the delimiter if found */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2073,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..ea277b66b1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -191,7 +192,14 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -235,9 +243,18 @@ CopySendEndOfRow(CopyToState cstate)
}
break;
case COPY_FRONTEND:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
+ {
+ /* The FE/BE protocol uses \n as newline for all platforms */
CopySendChar(cstate, '\n');
+ }
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
@@ -570,6 +587,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +859,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +943,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +972,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +992,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Ensure only one column is being copied */
+ if (list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1277,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af7..7f8d6f4f94 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "raw");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f..8996bc89e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
@@ -79,7 +80,7 @@ typedef struct CopyFormatOptions
char *null_print_client; /* same converted to file encoding */
char *default_print; /* DEFAULT marker string */
int default_print_len; /* length of same */
- char *delim; /* column delimiter (must be 1 byte) */
+ char *delim; /* delimiter (1 byte, except for raw format) */
char *quote; /* CSV quote char (must be 1 byte) */
char *escape; /* CSV escape char (must be 1 byte) */
List *force_quote; /* list of column names */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..b8693ae59e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -38,6 +38,7 @@ typedef enum EolType
EOL_NL,
EOL_CR,
EOL_CRNL,
+ EOL_CUSTOM,
} EolType;
/*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..2825d833ea 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,55 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+ col
+----------------------------------------
+ sharon 25 (15,12) 1000 sam +
+ sam 30 (10,5) 2000 bill +
+ bill 20 (11,10) 1000 sharon+
+
+(1 row)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+----------------------------------------
+ bill 20 (11,10) 1000 sharon
+ sam 30 (10,5) 2000 bill
+ sharon 25 (15,12) 1000 sam
+(3 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+sharon 25 (15,12) 1000 sam
+***
+sam 30 (10,5) 2000 bill
+***
+bill 20 (11,10) 1000 sharon
+***
+\qecho
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+--------
+
+ "def",
+ abc\.
+ ghi
+(4 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..f31bd6a322 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,15 +90,35 @@ COPY x from stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format RAW, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, header);
+ERROR: cannot specify HEADER in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x from stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
@@ -108,6 +128,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +140,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +886,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +959,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..93595037dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,27 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+\qecho
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
+\.
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..7aee4ca8ea 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,32 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
+COPY x from stdin (format RAW, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x from stdin (format RAW, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x from stdin (format RAW, quote 'x');
+COPY x from stdin (format RAW, header);
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x from stdin (format TEXT, force_quote(a));
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +650,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +722,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
--
2.45.1
On Fri, Oct 18, 2024, at 19:24, Joel Jacobson wrote:
Attachments:
* v11-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch
* v11-0002-Add-raw-format-to-COPY-command.patch
Here is a demo of a importing a decently sized real text file,
that can't currently be imported without the CSV hack:
$ head /var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64
.package-cache-mutate devel/cargo
bin admin/base-files
bin/archdetect admin/ubiquity
bin/autopartition admin/ubiquity
bin/autopartition-crypto admin/ubiquity
bin/autopartition-loop admin/ubiquity
bin/autopartition-lvm admin/ubiquity
bin/block-attr admin/ubiquity
bin/blockdev-keygen admin/ubiquity
bin/blockdev-wipe admin/ubiquity
This file uses a combination of tabs and spaces, in between the two columns,
so none of the existing formats are suitable to deal with this file.
$ ls -lah /var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64
-rw-r--r-- 1 root root 791M Apr 24 02:07 /var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64
To import using the CSV hack, we first have find two bytes that don't exist anyway,
which can be done using e.g. ripgrep. The below command verifies \x01 and \x02
don't exist anywhere:
$ rg -uuu --multiline '(?-u)[\x01|\x02]' /var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64
$
Knowing these bytes don't exist anywhere,
we can then safely use these as delimiter and quote characters,
as a hack to disable these features:
CREATE TABLE package_contents (raw_line text);
COPY package_contents FROM '/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64' (FORMAT CSV, DELIMITER E'\x01', QUOTE E'\x02');
COPY 8443588
Time: 3882.100 ms (00:03.882)
Time: 3552.991 ms (00:03.553)
Time: 3748.038 ms (00:03.748)
Time: 3775.947 ms (00:03.776)
Time: 3729.020 ms (00:03.729)
I tested writing a Rust program that would read the file line-by-line and INSERT each line instead.
This is of course a lot slower, since it has to execute each insert separately:
$ cargo run --release
Compiling insert_package_contents v0.1.0 (/home/joel/insert_package_contents)
Finished `release` profile [optimized] target(s) in 0.70s
Running `target/release/insert_package_contents`
Connecting to the PostgreSQL database...
Successfully connected to the database.
Starting to insert lines from the file...
Successfully inserted 8443588 lines into package_contents in 134.65s.
New approach using the RAW format:
COPY package_contents FROM '/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64' (FORMAT RAW, DELIMITER E'\n');
COPY 8443588
Time: 2918.489 ms (00:02.918)
Time: 3020.372 ms (00:03.020)
Time: 3336.589 ms (00:03.337)
Time: 3067.268 ms (00:03.067)
Time: 3343.694 ms (00:03.344)
Apart from the convenience improvement,
it seems to be somewhat faster already.
/Joel
On Sat, Oct 19, 2024 at 1:24 AM Joel Jacobson <joel@compiler.org> wrote:
Handling of e.g. JSON and other structured text files that could contain
newlines, in a seamless way seems important, so therefore the default is
no delimiter for the raw format, so that the entire input is read as one data
value for COPY FROM, and all column data is concatenated without delimiter
for COPY TO.When specifying a delimiter for the raw format, it separates *rows*, and can be
a multi-byte string, such as E'\r\n' to handle Windows text files.This has been documented under the DELIMITER option, as well as under the
Raw Format section.
We already make RAW and can only have one column.
if RAW has no default delimiter, then COPY FROM a text file will
become one datum value;
which makes it looks like importing a Large Object.
(https://www.postgresql.org/docs/17/lo-funcs.html)
i think, most of the time, you have more than one row/value to import
and export?
The refactoring is now in a separate first single commit, which seems
necessary, to separate the new functionality, from the refactoring.
I agree.
ProcessCopyOptions
/* Extract options from the statement node tree */
foreach(option, options)
{
}
/* --- DELIMITER option --- */
/* --- NULL option --- */
/* --- QUOTE option --- */
Currently the regress test passed, i think that means your refactor is fine.
in ProcessCopyOptions, maybe we can rearrange the code after the
foreach loop (foreach(option, options)
based on the parameters order in
https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
so we can review it by comparing the refactoring with the
sql-copy.html Parameters section's description.
We already did column length checking at BeginCopyTo.
no need to "if (list_length(cstate->attnumlist) != 1)" error check in
CopyOneRowTo?Hmm, not sure really, since DoCopy() calls both BeginCopyTo()
and DoCopyTo() which in turn calls CopyOneRowTo(),
but CopyOneRowTo() is also being called from copy_dest_receive().
BeginCopyTo do the preparation work.
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
After CopyGetAttnums, the number of attributes for COPY TO cannot be changed.
right after CopyGetAttnums call then check the length of cstate->attnumlist
seems fine for me.
I think in CopyOneRowTo, we can actually
Assert(list_length(cstate->attnumlist) == 1).
for raw format.
src10=# drop table if exists x;
create table x(a int);
COPY x from stdin (FORMAT raw);
DROP TABLE
CREATE TABLE
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
11
12
\.
ERROR: invalid input syntax for type integer: "11
12
"
CONTEXT: COPY x, line 1, column a: "11
12
"
The above case means COPY FROM STDIN (FORMAT RAW) can only import one
single value (when successful).
user need to specify like:
COPY x from stdin (FORMAT raw, delimiter E'\n');
seems raw format default no delimiter is not user friendly.
On Sat, Oct 19, 2024, at 12:13, jian he wrote:
We already make RAW and can only have one column.
if RAW has no default delimiter, then COPY FROM a text file will
become one datum value;
which makes it looks like importing a Large Object.
(https://www.postgresql.org/docs/17/lo-funcs.html)
The single datum value might not come from a physical column; it could be
an aggregated JSON value, as in the example Daniel mentioned:
On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote:
copy (select json_agg(col) from table ) to 'file' RAW
This is a variant of the discussion in [1] where the OP does:
copy (select json_agg(row_to_json(t)) from <query> t) TO 'file'
and he complains that both text and csv "break the JSON".
That discussion morphed into a proposed patch adding JSON
format to COPY, but RAW would work directly as the OP
expected.That is, unless <query> happens to include JSON fields with LF/CRLF
in them, and the RAW format says this is an error condition.
In that case it's quite annoying to make it an error, rather than
simply let it pass.[1]
/messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com
In such cases, a user could perform the following:
CREATE TABLE customers (id int, name text, email text);
INSERT INTO customers (id, name, email) VALUES
(1, 'John Doe', 'john.doe@example.com'),
(2, 'Jane Smith', 'jane.smith@example.com'),
(3, 'Alice Johnson', 'alice.johnson@example.com');
COPY (SELECT json_agg(row_to_json(t)) FROM customers t) TO '/tmp/file' (FORMAT raw);
% cat /tmp/file
[{"id":1,"name":"John Doe","email":"john.doe@example.com"}, {"id":2,"name":"Jane Smith","email":"jane.smith@example.com"}, {"id":3,"name":"Alice Johnson","email":"alice.johnson@example.com"}]%
i think, most of the time, you have more than one row/value to import
and export?
Yes, probably, but it might not be a physical row. It could be an aggregated
one, like in the example above. When importing, it might be a large JSON array
of objects that is imported into a temporary table and then deserialized into
a proper schema.
The need to load entire files is already fulfilled by pg_read_file(text) -> text,
but there is no pg_write_file(), likely for security reasons.
So COPY TO ... (FORMAT RAW) with no delimiter seems necessary,
and then COPY FROM also needs to work accordingly.
The refactoring is now in a separate first single commit, which seems
necessary, to separate the new functionality, from the refactoring.I agree.
ProcessCopyOptions
/* Extract options from the statement node tree */
foreach(option, options)
{
}
/* --- DELIMITER option --- */
/* --- NULL option --- */
/* --- QUOTE option --- */
Currently the regress test passed, i think that means your refactor is fine.
I believe that a passing test indicates it might be okay,
but a failing test definitely means it's not. :D
I've meticulously refactored one option at a time, checking which code in
ProcessCopyOptions depends on each option field to ensure the semantics
are preserved.
I think the changes are easy to follow, and it's clear that each change is
correct when looking at them individually, though it might be more challenging
when viewing the total change.
I've tried to minimize code movement, preserving as much of the original
code placement as possible.
in ProcessCopyOptions, maybe we can rearrange the code after the
foreach loop (foreach(option, options)
based on the parameters order in
https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
so we can review it by comparing the refactoring with the
sql-copy.html Parameters section's description.
That would be nice, but unfortunately, it's not possible because the order of
the option code blocks matters due to the setting of defaults in else/else
if branches when an option is not specified.
For example, in the documentation, DEFAULT precedes QUOTE,
but in ProcessCopyOptions, the QUOTE code block must come before
the DEFAULT code block due to the check:
/* Don't allow the CSV quote char to appear in the default string. */
I also believe there's value in minimizing code movement.
We already did column length checking at BeginCopyTo.
no need to "if (list_length(cstate->attnumlist) != 1)" error check in
CopyOneRowTo?Hmm, not sure really, since DoCopy() calls both BeginCopyTo()
and DoCopyTo() which in turn calls CopyOneRowTo(),
but CopyOneRowTo() is also being called from copy_dest_receive().BeginCopyTo do the preparation work.
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);After CopyGetAttnums, the number of attributes for COPY TO cannot be changed.
right after CopyGetAttnums call then check the length of cstate->attnumlist
seems fine for me.
I think in CopyOneRowTo, we can actually
Assert(list_length(cstate->attnumlist) == 1).
for raw format.
Right, I've changed it to an Assert instead.
src10=# drop table if exists x;
create table x(a int);
COPY x from stdin (FORMAT raw);
DROP TABLE
CREATE TABLE
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.11
12
\.ERROR: invalid input syntax for type integer: "11
12
"
CONTEXT: COPY x, line 1, column a: "11
12
"The above case means COPY FROM STDIN (FORMAT RAW) can only import one
single value (when successful).
user need to specify like:COPY x from stdin (FORMAT raw, delimiter E'\n');
seems raw format default no delimiter is not user friendly.
I have no idea if dealing with .json files that would contain newlines
in between fields, and would therefore need to be imported "as is",
is more common than dealing with e.g. .jsonl files where it's guaranteed
each json value is on a single line.
I think Jacob raised some valid concerns on automagically detecting
newlines, that is how text/csv works, so I don't think we want that.
Maybe the OS default EOL would be an OK default,
if we want it to be the default delimiter, that is.
I have no strong opinion here, except automagical newline detection seems
like a bad idea.
I'm fine with OS default EOL as the default for the delimiter,
or no delimiter as the default.
New patch attached.
Changes:
* Change run-time check to assert in CopyOneRowTo, since checked by caller already.
/Joel
Attachments:
v12-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patchapplication/octet-stream; name="=?UTF-8?Q?v12-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.?= =?UTF-8?Q?patch?="Download
From efc5098742954cfe7a39721a08c845638171d7db Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 17 Oct 2024 09:00:22 +0200
Subject: [PATCH 1/2] Refactor ProcessCopyOptions: introduce CopyFormat enum;
reorganize validation.
Replace the 'binary' and 'csv_mode' boolean fields in CopyFormatOptions with a
new enum CopyFormat, which explicitly represents the COPY format as
COPY_FORMAT_TEXT, COPY_FORMAT_BINARY, or COPY_FORMAT_CSV. This clarifies the
code by directly representing the format, simplifying checks and making the
logic more transparent.
Reorganize the option validation in ProcessCopyOptions by separating validation
checks into their own sections based on the options being validated. This
enhances readability by grouping related validations together, making the code
easier to follow and maintain.
Minor style edits:
* Using `!= NIL` to check if `opts_out->force_quote` is empty,
to match the existing checks for `opts_out->force_notnull`
and `opts_out->force_null`.
* Use `opts_out->on_error != COPY_ON_ERROR_STOP`,
instead of `!opts_out->on_error`,
to improve readability by conveying that on_error is an enum,
and that the branch is taken if ON_ERROR is not "stop".
No behavioral changes are intended; this is a pure refactoring to improve code
clarity and maintainability.
---
src/backend/commands/copy.c | 447 ++++++++++++++++-----------
src/backend/commands/copyfrom.c | 10 +-
src/backend/commands/copyfromparse.c | 34 +-
src/backend/commands/copyto.c | 20 +-
src/include/commands/copy.h | 13 +-
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 302 insertions(+), 223 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..a5cde15724 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -671,63 +671,126 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->binary && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ /* --- DELIMITER option --- */
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- if (opts_out->binary && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
- /* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ /*
+ * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
+ {
+ /* Set default delimiter */
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ }
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ /* --- NULL option --- */
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->csv_mode)
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+ }
+ else if (opts_out->format != COPY_FORMAT_BINARY)
{
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
+ /* Set default null_print */
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
}
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -735,136 +798,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
- }
-
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (!opts_out->csv_mode &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
- /* Check header */
- if (opts_out->binary && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
- /* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
- /* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
- /* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
- /* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
- /* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
-
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
-
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -882,7 +816,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -898,19 +832,154 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ /* --- HEADER option --- */
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 463083e645..78531ae846 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -771,7 +771,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -792,7 +792,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -833,7 +833,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -880,7 +880,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -908,7 +908,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -917,7 +917,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -937,7 +937,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..c3d1df267f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..59433d120e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v12-0002-Add-raw-format-to-COPY-command.patchapplication/octet-stream; name="=?UTF-8?Q?v12-0002-Add-raw-format-to-COPY-command.patch?="Download
From 8dff0a0916df71e696a0955a30ff680308b1d47a Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Tue, 15 Oct 2024 03:03:09 +0200
Subject: [PATCH 2/2] Add raw format to COPY command.
This commit introduces a new raw format to the COPY command, enabling
efficient bulk data transfer of a single text column without any parsing,
quoting, or escaping. In raw format, data is copied exactly as it appears
in the file or table, adhering to the specified ENCODING option or the
current client encoding.
The raw format enforces a single column requirement, ensuring that exactly
one column is specified in the column list. Attempts to specify multiple
columns or omit the column list when the table has multiple columns will
result in an error. Additionally, the DELIMITER option in raw format accepts
any string, including multi-byte characters, providing greater flexibility
in defining data separators. If no DELIMITER is specified, the entire input
or output is treated as a single data value.
Furthermore, the raw format does not support format-specific options such as
NULL, HEADER, QUOTE, ESCAPE, FORCE_QUOTE, FORCE_NOT_NULL, and FORCE_NULL.
Using these options with the raw format will trigger errors, ensuring that
data remains unaltered during the transfer process.
This enhancement is particularly useful when handling text blobs, JSON files,
or other text-based formats where preserving the data "as is" is crucial.
---
doc/src/sgml/ref/copy.sgml | 134 ++++++++++++++--
src/backend/commands/copy.c | 105 ++++++++-----
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 188 ++++++++++++++++++++++-
src/backend/commands/copyto.c | 89 ++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 1 +
src/test/regress/expected/copy.out | 52 +++++++
src/test/regress/expected/copy2.out | 52 ++++++-
src/test/regress/sql/copy.sql | 24 +++
src/test/regress/sql/copy2.sql | 37 ++++-
12 files changed, 622 insertions(+), 72 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..f17d606537 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -253,11 +254,27 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>DELIMITER</literal></term>
<listitem>
<para>
- Specifies the character that separates columns within each row
- (line) of the file. The default is a tab character in text format,
- a comma in <literal>CSV</literal> format.
- This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ Specifies the delimiter used in the file. Its usage depends on the
+ <literal>FORMAT</literal> specified:
+ <simplelist>
+ <member>
+ In <literal>text</literal> and <literal>CSV</literal> formats,
+ the delimiter separates <emphasis>columns</emphasis> within each row
+ (line) of the file.
+ The default is a tab character in <literal>text</literal> format and
+ a comma in <literal>CSV</literal> format. This must be a single
+ one-byte character.
+ </member>
+ <member>
+ In <literal>raw</literal> format, the delimiter separates
+ <emphasis>rows</emphasis> in the file. The default is no delimiter,
+ which means that for <command>COPY FROM</command>, the entire input is
+ read as a single field, and for <command>COPY TO</command>, the output
+ is concatenated without any delimiter. If a delimiter is specified,
+ it can be a multi-byte string; for example, <literal>E'\r\n'</literal>
+ can be used when dealing with text files on Windows platforms.
+ </member>
+ </simplelist>
</para>
</listitem>
</varlistentry>
@@ -271,7 +288,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +312,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -310,7 +328,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
The <literal>MATCH</literal> option is only valid for <command>COPY
FROM</command> commands.
</para>
@@ -400,7 +419,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +913,98 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2 id="sql-copy-raw-format" xreflabel="Raw Format">
+ <title>Raw Format</title>
+
+ <para>
+ The <literal>raw</literal> format is designed for efficient bulk data
+ transfer of a single text column without any parsing, quoting, or
+ escaping. In this format, data is copied exactly as it appears in the file
+ or table, interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding.
+ </para>
+
+ <para>
+ When using the <literal>raw</literal> format, each data value corresponds
+ to a single field with no additional formatting or processing. The
+ <literal>DELIMITER</literal> option specifies the string that separates
+ data values. Unlike in other formats, the delimiter in
+ <literal>raw</literal> format can be any string, including multi-byte
+ characters. If no <literal>DELIMITER</literal> is specified, the entire
+ input or output is treated as a single data value.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format requires that exactly one column be
+ specified in the column list. An error is raised if more than one column
+ is specified or if no column list is specified when the table has multiple
+ columns.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not support any of the
+ format-specific options of other formats, such as <literal>NULL</literal>,
+ <literal>HEADER</literal>, <literal>QUOTE</literal>,
+ <literal>ESCAPE</literal>, <literal>FORCE_QUOTE</literal>,
+ <literal>FORCE_NOT_NULL</literal>, and <literal>FORCE_NULL</literal>.
+ Attempting to use these options with <literal>raw</literal> format will
+ result in an error.
+ </para>
+
+ <para>
+ Since the <literal>raw</literal> format deals with text, the data is
+ interpreted according to the specified <literal>ENCODING</literal> option
+ or the current client encoding for input, and encoded using the specified
+ <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <note>
+ <para>
+ Empty lines in the input are treated as empty strings, not as
+ <literal>NULL</literal> values. There is no way to represent a
+ <literal>NULL</literal> value in <literal>raw</literal> format.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format is particularly useful when you need to
+ import or export data exactly as it appears. This can be
+ helpful when dealing with large text blobs, JSON files, or other
+ text-based formats.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format can only be used when copying exactly
+ one column. If the table has multiple columns, you must specify the
+ column list containing only one column.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ Unlike other formats, the delimiter in <literal>raw</literal> format can
+ be any string, and there are no restrictions on the characters used in
+ the delimiter, including newline or carriage return characters.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ When using <literal>COPY TO</literal> with <literal>raw</literal> format
+ and a specified <literal>DELIMITER</literal>, there is no check to prevent
+ data values from containing the delimiter string, which could be
+ problematic if it would be needed to import the data preserved using
+ <literal>COPY FROM</literal>, since a data value containing the delimiter
+ would then be split into two values. If this is a concern, a different
+ format should be used instead.
+ </para>
+ </note>
+ </refsect2>
+
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a5cde15724..6bff50127c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -680,41 +682,47 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+ }
- /*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format != COPY_FORMAT_CSV &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
- }
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default delimiter */
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ if (opts_out->format == COPY_FORMAT_TEXT)
+ {
+ /*
+ * Disallow unsafe delimiter characters in text mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
}
+ /* Set default delimiter */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
/* --- NULL option --- */
if (opts_out->null_print)
@@ -724,6 +732,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
/* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
@@ -731,11 +744,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
}
- else if (opts_out->format != COPY_FORMAT_BINARY)
- {
- /* Set default null_print */
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- }
+ /* Set default null_print */
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
@@ -787,6 +801,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
/* Assert options have been set (defaults applied if not specified) */
Assert(opts_out->delim);
Assert(opts_out->null_print);
@@ -845,6 +864,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
}
else
{
@@ -933,8 +958,8 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
@@ -977,7 +1002,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
+ errmsg("CSV quote character must not appear in the %s specification",
"NULL")));
}
}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..d898fce2c2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +830,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1110,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1147,6 +1164,21 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf.len -= 2;
cstate->line_buf.data[cstate->line_buf.len] = '\0';
break;
+ case EOL_CUSTOM:
+ {
+ int delim_len;
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+ Assert(cstate->opts.delim);
+ delim_len = strlen(cstate->opts.delim);
+ Assert(delim_len > 0);
+ Assert(cstate->line_buf.len >= delim_len);
+ Assert(memcmp(cstate->line_buf.data + cstate->line_buf.len - delim_len,
+ cstate->opts.delim,
+ delim_len) == 0);
+ cstate->line_buf.len -= delim_len;
+ cstate->line_buf.data[cstate->line_buf.len] = '\0';
+ }
+ break;
case EOL_UNKNOWN:
/* shouldn't get here */
Assert(false);
@@ -1462,6 +1494,109 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+ bool read_entire_file = (cstate->opts.delim == NULL);
+ int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;
+
+ /*
+ * The objective of this loop is to transfer data into line_buf until we
+ * find the specified delimiter or reach EOF. In raw format, we treat the
+ * input data as-is, without any parsing, quoting, or escaping. We are
+ * only interested in locating the delimiter to determine the boundaries
+ * of each data value.
+ *
+ * If a delimiter is specified, we read data until we encounter the
+ * delimiter string. If no delimiter is specified, we read the entire
+ * input as a single data value. Unlike text or CSV modes, we do not need
+ * to handle line endings, escape sequences, or special characters.
+ *
+ * The input has already been converted to the database encoding. All
+ * supported server encodings have the property that all bytes in a
+ * multi-byte sequence have the high bit set, so a multibyte character
+ * cannot contain any newline or escape characters embedded in the
+ * multibyte sequence. Therefore, we can process the input byte-by-byte,
+ * regardless of the encoding.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * We handle both single-byte and multi-byte delimiters. For multi-byte
+ * delimiters, we ensure that we have enough data in the buffer to compare
+ * the delimiter string.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+
+ /* Load more data if needed */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* Update local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /* If no more data, break out of the loop */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* Fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+
+ if (read_entire_file)
+ {
+ /* Continue until EOF if reading entire file */
+ input_buf_ptr++;
+ continue;
+ }
+ else
+ {
+ /* Check for delimiter, possibly multi-byte */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+ if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim,
+ delim_len) == 0)
+ {
+ cstate->eol_type = EOL_CUSTOM;
+ input_buf_ptr += delim_len;
+ break;
+ }
+ input_buf_ptr++;
+ }
+ }
+
+ /* Transfer data to line_buf, including the delimiter if found */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2073,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 78531ae846..0e15809656 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -191,7 +192,14 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -235,9 +243,18 @@ CopySendEndOfRow(CopyToState cstate)
}
break;
case COPY_FRONTEND:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
+ {
+ /* The FE/BE protocol uses \n as newline for all platforms */
CopySendChar(cstate, '\n');
+ }
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
@@ -570,6 +587,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -835,8 +859,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -917,7 +943,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -945,7 +972,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -965,6 +992,34 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Assert only one column is being copied */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1219,6 +1274,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af7..7f8d6f4f94 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "raw");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f..8996bc89e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
@@ -79,7 +80,7 @@ typedef struct CopyFormatOptions
char *null_print_client; /* same converted to file encoding */
char *default_print; /* DEFAULT marker string */
int default_print_len; /* length of same */
- char *delim; /* column delimiter (must be 1 byte) */
+ char *delim; /* delimiter (1 byte, except for raw format) */
char *quote; /* CSV quote char (must be 1 byte) */
char *escape; /* CSV escape char (must be 1 byte) */
List *force_quote; /* list of column names */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..b8693ae59e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -38,6 +38,7 @@ typedef enum EolType
EOL_NL,
EOL_CR,
EOL_CRNL,
+ EOL_CUSTOM,
} EolType;
/*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..2825d833ea 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,55 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+ col
+----------------------------------------
+ sharon 25 (15,12) 1000 sam +
+ sam 30 (10,5) 2000 bill +
+ bill 20 (11,10) 1000 sharon+
+
+(1 row)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+----------------------------------------
+ bill 20 (11,10) 1000 sharon
+ sam 30 (10,5) 2000 bill
+ sharon 25 (15,12) 1000 sam
+(3 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+sharon 25 (15,12) 1000 sam
+***
+sam 30 (10,5) 2000 bill
+***
+bill 20 (11,10) 1000 sharon
+***
+\qecho
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+--------
+
+ "def",
+ abc\.
+ ghi
+(4 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..f31bd6a322 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,15 +90,35 @@ COPY x from stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format RAW, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, header);
+ERROR: cannot specify HEADER in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x from stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
@@ -108,6 +128,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +140,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +886,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +959,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..93595037dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,27 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+\qecho
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
+\.
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..7aee4ca8ea 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,32 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
+COPY x from stdin (format RAW, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x from stdin (format RAW, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x from stdin (format RAW, quote 'x');
+COPY x from stdin (format RAW, header);
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x from stdin (format TEXT, force_quote(a));
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +650,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +722,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
--
2.45.1
On Sat, Oct 19, 2024 at 11:33 PM Joel Jacobson <joel@compiler.org> wrote:
ProcessCopyOptions
/* Extract options from the statement node tree */
foreach(option, options)
{
}
/* --- DELIMITER option --- */
/* --- NULL option --- */
/* --- QUOTE option --- */
Currently the regress test passed, i think that means your refactor is fine.I believe that a passing test indicates it might be okay,
but a failing test definitely means it's not. :DI've meticulously refactored one option at a time, checking which code in
ProcessCopyOptions depends on each option field to ensure the semantics
are preserved.I think the changes are easy to follow, and it's clear that each change is
correct when looking at them individually, though it might be more challenging
when viewing the total change.I've tried to minimize code movement, preserving as much of the original
code placement as possible.in ProcessCopyOptions, maybe we can rearrange the code after the
foreach loop (foreach(option, options)
based on the parameters order in
https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
so we can review it by comparing the refactoring with the
sql-copy.html Parameters section's description.That would be nice, but unfortunately, it's not possible because the order of
the option code blocks matters due to the setting of defaults in else/else
if branches when an option is not specified.For example, in the documentation, DEFAULT precedes QUOTE,
but in ProcessCopyOptions, the QUOTE code block must come before
the DEFAULT code block due to the check:/* Don't allow the CSV quote char to appear in the default string. */
I also believe there's value in minimizing code movement.
but v12-0001 was already hugely refactored.
make the ProcessCopyOptions process in following order:
1. Extract options from the statement node tree
2. checking each option, if not there set default value.
3. checking for interdependent options
I still think
making step2 aligned with the doc parameter section order will make it
more readable.
based on your patch
(v12-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch)
I put some checking to step3, make step2 checking order aligned with doc.
Attachments:
v12-0001-make-the-ProcessCopyOptions-option-aligned-wi.no-cfbotapplication/octet-stream; name=v12-0001-make-the-ProcessCopyOptions-option-aligned-wi.no-cfbotDownload
From 9de54cfffca701f6842349fb9b2885af361eb377 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 21 Oct 2024 22:17:32 +0800
Subject: [PATCH v12 1/1] make the ProcessCopyOptions option aligned with doc
entry
---
src/backend/commands/copy.c | 122 ++++++++++++++++++------------------
1 file changed, 61 insertions(+), 61 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a5cde15724..bb8e265011 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -671,6 +671,18 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
/* --- DELIMITER option --- */
if (opts_out->delim)
{
@@ -739,46 +751,6 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
- /* --- QUOTE option --- */
- if (opts_out->quote)
- {
- if (opts_out->format != COPY_FORMAT_CSV)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
- }
- else if (opts_out->format == COPY_FORMAT_CSV)
- {
- /* Set default quote */
- opts_out->quote = "\"";
- }
-
- /* --- ESCAPE option --- */
- if (opts_out->escape)
- {
- if (opts_out->format != COPY_FORMAT_CSV)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
- }
- else if (opts_out->format == COPY_FORMAT_CSV)
- {
- /* Set default escape to quote character */
- opts_out->escape = opts_out->quote;
- }
-
/* --- DEFAULT option --- */
if (opts_out->default_print)
{
@@ -815,15 +787,6 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter character must not appear in the %s specification",
"DEFAULT")));
- /* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "DEFAULT")));
-
/* Don't allow the NULL and DEFAULT string to be the same */
if (opts_out->null_print_len == opts_out->default_print_len &&
strncmp(opts_out->null_print, opts_out->default_print,
@@ -851,6 +814,46 @@ ProcessCopyOptions(ParseState *pstate,
/* Default is no header; no action needed */
}
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
+
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+
/* --- FORCE_QUOTE option --- */
if (opts_out->force_quote != NIL || opts_out->force_quote_all)
{
@@ -905,18 +908,6 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
}
- /* --- FREEZE option --- */
- if (opts_out->freeze)
- {
- if (!is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
- }
-
/* --- ON_ERROR option --- */
if (opts_out->on_error != COPY_ON_ERROR_STOP)
{
@@ -967,6 +958,15 @@ ProcessCopyOptions(ParseState *pstate,
Assert(opts_out->quote);
Assert(opts_out->null_print);
+ /* Don't allow the CSV quote char to appear in the default string. */
+ if (opts_out->default_print_len > 0 &&
+ strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "DEFAULT")));
+
if (opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
--
2.34.1
On Mon, Oct 21, 2024, at 16:35, jian he wrote:
make the ProcessCopyOptions process in following order:
1. Extract options from the statement node tree
2. checking each option, if not there set default value.
3. checking for interdependent optionsI still think
making step2 aligned with the doc parameter section order will make it
more readable.based on your patch
(v12-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch)
I put some checking to step3, make step2 checking order aligned with doc.
Smart to move the interdependent check to the designated section for it,
that's exactly the right place for it.
Really nice the order in the code now is aligned with the doc order.
/Joel
Hi,
On Sat, Oct 19, 2024 at 8:33 AM Joel Jacobson <joel@compiler.org> wrote:
On Sat, Oct 19, 2024, at 12:13, jian he wrote:
We already make RAW and can only have one column.
if RAW has no default delimiter, then COPY FROM a text file will
become one datum value;
which makes it looks like importing a Large Object.
(https://www.postgresql.org/docs/17/lo-funcs.html)The single datum value might not come from a physical column; it could be
an aggregated JSON value, as in the example Daniel mentioned:On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote:
copy (select json_agg(col) from table ) to 'file' RAW
This is a variant of the discussion in [1] where the OP does:
copy (select json_agg(row_to_json(t)) from <query> t) TO 'file'
and he complains that both text and csv "break the JSON".
That discussion morphed into a proposed patch adding JSON
format to COPY, but RAW would work directly as the OP
expected.That is, unless <query> happens to include JSON fields with LF/CRLF
in them, and the RAW format says this is an error condition.
In that case it's quite annoying to make it an error, rather than
simply let it pass.[1]
/messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.comIn such cases, a user could perform the following:
CREATE TABLE customers (id int, name text, email text);
INSERT INTO customers (id, name, email) VALUES
(1, 'John Doe', 'john.doe@example.com'),
(2, 'Jane Smith', 'jane.smith@example.com'),
(3, 'Alice Johnson', 'alice.johnson@example.com');COPY (SELECT json_agg(row_to_json(t)) FROM customers t) TO '/tmp/file' (FORMAT raw);
% cat /tmp/file
[{"id":1,"name":"John Doe","email":"john.doe@example.com"}, {"id":2,"name":"Jane Smith","email":"jane.smith@example.com"}, {"id":3,"name":"Alice Johnson","email":"alice.johnson@example.com"}]%i think, most of the time, you have more than one row/value to import
and export?Yes, probably, but it might not be a physical row. It could be an aggregated
one, like in the example above. When importing, it might be a large JSON array
of objects that is imported into a temporary table and then deserialized into
a proper schema.The need to load entire files is already fulfilled by pg_read_file(text) -> text,
but there is no pg_write_file(), likely for security reasons.
So COPY TO ... (FORMAT RAW) with no delimiter seems necessary,
and then COPY FROM also needs to work accordingly.The refactoring is now in a separate first single commit, which seems
necessary, to separate the new functionality, from the refactoring.I agree.
ProcessCopyOptions
/* Extract options from the statement node tree */
foreach(option, options)
{
}
/* --- DELIMITER option --- */
/* --- NULL option --- */
/* --- QUOTE option --- */
Currently the regress test passed, i think that means your refactor is fine.I believe that a passing test indicates it might be okay,
but a failing test definitely means it's not. :DI've meticulously refactored one option at a time, checking which code in
ProcessCopyOptions depends on each option field to ensure the semantics
are preserved.I think the changes are easy to follow, and it's clear that each change is
correct when looking at them individually, though it might be more challenging
when viewing the total change.I've tried to minimize code movement, preserving as much of the original
code placement as possible.in ProcessCopyOptions, maybe we can rearrange the code after the
foreach loop (foreach(option, options)
based on the parameters order in
https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
so we can review it by comparing the refactoring with the
sql-copy.html Parameters section's description.That would be nice, but unfortunately, it's not possible because the order of
the option code blocks matters due to the setting of defaults in else/else
if branches when an option is not specified.For example, in the documentation, DEFAULT precedes QUOTE,
but in ProcessCopyOptions, the QUOTE code block must come before
the DEFAULT code block due to the check:/* Don't allow the CSV quote char to appear in the default string. */
I also believe there's value in minimizing code movement.
We already did column length checking at BeginCopyTo.
no need to "if (list_length(cstate->attnumlist) != 1)" error check in
CopyOneRowTo?Hmm, not sure really, since DoCopy() calls both BeginCopyTo()
and DoCopyTo() which in turn calls CopyOneRowTo(),
but CopyOneRowTo() is also being called from copy_dest_receive().BeginCopyTo do the preparation work.
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);After CopyGetAttnums, the number of attributes for COPY TO cannot be changed.
right after CopyGetAttnums call then check the length of cstate->attnumlist
seems fine for me.
I think in CopyOneRowTo, we can actually
Assert(list_length(cstate->attnumlist) == 1).
for raw format.Right, I've changed it to an Assert instead.
src10=# drop table if exists x;
create table x(a int);
COPY x from stdin (FORMAT raw);
DROP TABLE
CREATE TABLE
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.11
12
\.ERROR: invalid input syntax for type integer: "11
12
"
CONTEXT: COPY x, line 1, column a: "11
12
"The above case means COPY FROM STDIN (FORMAT RAW) can only import one
single value (when successful).
user need to specify like:COPY x from stdin (FORMAT raw, delimiter E'\n');
seems raw format default no delimiter is not user friendly.
I have no idea if dealing with .json files that would contain newlines
in between fields, and would therefore need to be imported "as is",
is more common than dealing with e.g. .jsonl files where it's guaranteed
each json value is on a single line.I think Jacob raised some valid concerns on automagically detecting
newlines, that is how text/csv works, so I don't think we want that.Maybe the OS default EOL would be an OK default,
if we want it to be the default delimiter, that is.I have no strong opinion here, except automagical newline detection seems
like a bad idea.I'm fine with OS default EOL as the default for the delimiter,
or no delimiter as the default.New patch attached.
I have one question:
From the 0001 patch's commit message:
No behavioral changes are intended; this is a pure refactoring to improve code
clarity and maintainability.
Does the reorganization of the option validation done by this patch
also help make the 0002 patch simple or small? If not much, while it
makes sense to me that introducing the CopyFormat enum is required by
the 0002 patch, I think we can discuss the reorganization part
separately. And I'd suggest the patch organization would be:
0001: introduce CopyFormat and replace csv_mode and binary fields with it.
0002: add new 'raw' format.
0003: reorganize option validations.
One benefit would be that we can remove the simple replacements like
'->binary' with '.format == COPY_FORMAT_BINARY' from the
reorganization patch, which makes it small. The 0001 and 0002 patches
that seem to be more straightforward are independent from the 0003
patch, and we can discuss how to make the option validations including
the new 'raw' format better.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Thu, Oct 24, 2024, at 03:54, Masahiko Sawada wrote:
I have one question:
From the 0001 patch's commit message:
No behavioral changes are intended; this is a pure refactoring to improve code
clarity and maintainability.Does the reorganization of the option validation done by this patch
also help make the 0002 patch simple or small?
Thanks for the review. No, not much, except the changes necessary to
ProcessCopyOptions for raw, without also refactoring it, makes it
more complicated.
If not much, while it
makes sense to me that introducing the CopyFormat enum is required by
the 0002 patch, I think we can discuss the reorganization part
separately. And I'd suggest the patch organization would be:0001: introduce CopyFormat and replace csv_mode and binary fields with it.
0002: add new 'raw' format.
0003: reorganize option validations.One benefit would be that we can remove the simple replacements like
'->binary' with '.format == COPY_FORMAT_BINARY' from the
reorganization patch, which makes it small. The 0001 and 0002 patches
that seem to be more straightforward are independent from the 0003
patch, and we can discuss how to make the option validations including
the new 'raw' format better.
Sure, that works for me.
I've attached patches organized like that.
/Joel
Attachments:
v13-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v13-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download
From 5c12d546df478af3c0d71c7a6e2ccc800b5d9970 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
fields with it.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..1532f59993 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b1..50bb4b7750 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -163,7 +163,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -749,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -766,7 +766,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -821,7 +821,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -865,7 +865,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -906,7 +906,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1179,7 +1179,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1256,7 +1256,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1295,7 +1295,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1323,10 +1323,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1340,10 +1340,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1351,15 +1351,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1371,7 +1371,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..03c9d71d34 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..c3d1df267f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 54bf29be24..26bb5cde4c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v13-0002-Add-raw-format-to-COPY-command.patchapplication/octet-stream; name="=?UTF-8?Q?v13-0002-Add-raw-format-to-COPY-command.patch?="Download
From 638b189dd73ac378b4fbb30dbcb7f36b4e654657 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:16:31 +0300
Subject: [PATCH 2/3] Add raw format to COPY command.
This commit introduces a new raw format to the COPY command, enabling
efficient bulk data transfer of a single text column without any parsing,
quoting, or escaping. In raw format, data is copied exactly as it appears
in the file or table, adhering to the specified ENCODING option or the
current client encoding.
The raw format enforces a single column requirement, ensuring that exactly
one column is specified in the column list. Attempts to specify multiple
columns or omit the column list when the table has multiple columns will
result in an error. Additionally, the DELIMITER option in raw format accepts
any string, including multi-byte characters, providing greater flexibility
in defining data separators. If no DELIMITER is specified, the entire input
or output is treated as a single data value.
Furthermore, the raw format does not support format-specific options such as
NULL, HEADER, QUOTE, ESCAPE, FORCE_QUOTE, FORCE_NOT_NULL, and FORCE_NULL.
Using these options with the raw format will trigger errors, ensuring that
data remains unaltered during the transfer process.
This enhancement is particularly useful when handling text blobs, JSON files,
or other text-based formats where preserving the data "as is" is crucial.
---
doc/src/sgml/ref/copy.sgml | 134 ++++++++++++++--
src/backend/commands/copy.c | 91 +++++++----
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 188 ++++++++++++++++++++++-
src/backend/commands/copyto.c | 89 ++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 1 +
src/test/regress/expected/copy.out | 52 +++++++
src/test/regress/expected/copy2.out | 52 ++++++-
src/test/regress/sql/copy.sql | 24 +++
src/test/regress/sql/copy2.sql | 37 ++++-
12 files changed, 619 insertions(+), 61 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..f17d606537 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -253,11 +254,27 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>DELIMITER</literal></term>
<listitem>
<para>
- Specifies the character that separates columns within each row
- (line) of the file. The default is a tab character in text format,
- a comma in <literal>CSV</literal> format.
- This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ Specifies the delimiter used in the file. Its usage depends on the
+ <literal>FORMAT</literal> specified:
+ <simplelist>
+ <member>
+ In <literal>text</literal> and <literal>CSV</literal> formats,
+ the delimiter separates <emphasis>columns</emphasis> within each row
+ (line) of the file.
+ The default is a tab character in <literal>text</literal> format and
+ a comma in <literal>CSV</literal> format. This must be a single
+ one-byte character.
+ </member>
+ <member>
+ In <literal>raw</literal> format, the delimiter separates
+ <emphasis>rows</emphasis> in the file. The default is no delimiter,
+ which means that for <command>COPY FROM</command>, the entire input is
+ read as a single field, and for <command>COPY TO</command>, the output
+ is concatenated without any delimiter. If a delimiter is specified,
+ it can be a multi-byte string; for example, <literal>E'\r\n'</literal>
+ can be used when dealing with text files on Windows platforms.
+ </member>
+ </simplelist>
</para>
</listitem>
</varlistentry>
@@ -271,7 +288,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +312,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -310,7 +328,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
The <literal>MATCH</literal> option is only valid for <command>COPY
FROM</command> commands.
</para>
@@ -400,7 +419,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +913,98 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2 id="sql-copy-raw-format" xreflabel="Raw Format">
+ <title>Raw Format</title>
+
+ <para>
+ The <literal>raw</literal> format is designed for efficient bulk data
+ transfer of a single text column without any parsing, quoting, or
+ escaping. In this format, data is copied exactly as it appears in the file
+ or table, interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding.
+ </para>
+
+ <para>
+ When using the <literal>raw</literal> format, each data value corresponds
+ to a single field with no additional formatting or processing. The
+ <literal>DELIMITER</literal> option specifies the string that separates
+ data values. Unlike in other formats, the delimiter in
+ <literal>raw</literal> format can be any string, including multi-byte
+ characters. If no <literal>DELIMITER</literal> is specified, the entire
+ input or output is treated as a single data value.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format requires that exactly one column be
+ specified in the column list. An error is raised if more than one column
+ is specified or if no column list is specified when the table has multiple
+ columns.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not support any of the
+ format-specific options of other formats, such as <literal>NULL</literal>,
+ <literal>HEADER</literal>, <literal>QUOTE</literal>,
+ <literal>ESCAPE</literal>, <literal>FORCE_QUOTE</literal>,
+ <literal>FORCE_NOT_NULL</literal>, and <literal>FORCE_NULL</literal>.
+ Attempting to use these options with <literal>raw</literal> format will
+ result in an error.
+ </para>
+
+ <para>
+ Since the <literal>raw</literal> format deals with text, the data is
+ interpreted according to the specified <literal>ENCODING</literal> option
+ or the current client encoding for input, and encoded using the specified
+ <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <note>
+ <para>
+ Empty lines in the input are treated as empty strings, not as
+ <literal>NULL</literal> values. There is no way to represent a
+ <literal>NULL</literal> value in <literal>raw</literal> format.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format is particularly useful when you need to
+ import or export data exactly as it appears. This can be
+ helpful when dealing with large text blobs, JSON files, or other
+ text-based formats.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format can only be used when copying exactly
+ one column. If the table has multiple columns, you must specify the
+ column list containing only one column.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ Unlike other formats, the delimiter in <literal>raw</literal> format can
+ be any string, and there are no restrictions on the characters used in
+ the delimiter, including newline or carriage return characters.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ When using <literal>COPY TO</literal> with <literal>raw</literal> format
+ and a specified <literal>DELIMITER</literal>, there is no check to prevent
+ data values from containing the delimiter string, which could be
+ problematic if it would be needed to import the data preserved using
+ <literal>COPY FROM</literal>, since a data value containing the delimiter
+ would then be split into two values. If this is a concern, a different
+ format should be used instead.
+ </para>
+ </note>
+ </refsect2>
+
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1532f59993..1d92836e68 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -686,18 +688,61 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ if (opts_out->delim)
+ {
+ if (opts_out->format != COPY_FORMAT_RAW)
+ {
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+ }
+ }
/* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ if (opts_out->null_print)
+ {
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
if (opts_out->format == COPY_FORMAT_CSV)
{
@@ -707,25 +752,6 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +764,7 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * Disallow unsafe delimiter characters in text mode. We can't allow
* backslash because it would be ambiguous. We can't allow the other
* cases because data characters matching the delimiter must be
* backslashed, and certain backslash combinations are interpreted
@@ -747,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (opts_out->format != COPY_FORMAT_CSV &&
+ if (opts_out->format == COPY_FORMAT_TEXT &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -761,6 +787,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
@@ -839,11 +871,12 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && opts_out->null_print &&
+ strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
+ errmsg("COPY delimiter character must not appear in the %s specification",
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
@@ -875,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the default string. */
- if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 50bb4b7750..d898fce2c2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -143,8 +143,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -732,7 +734,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -748,7 +750,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -768,8 +770,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -823,8 +830,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1096,7 +1110,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1147,6 +1164,21 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf.len -= 2;
cstate->line_buf.data[cstate->line_buf.len] = '\0';
break;
+ case EOL_CUSTOM:
+ {
+ int delim_len;
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+ Assert(cstate->opts.delim);
+ delim_len = strlen(cstate->opts.delim);
+ Assert(delim_len > 0);
+ Assert(cstate->line_buf.len >= delim_len);
+ Assert(memcmp(cstate->line_buf.data + cstate->line_buf.len - delim_len,
+ cstate->opts.delim,
+ delim_len) == 0);
+ cstate->line_buf.len -= delim_len;
+ cstate->line_buf.data[cstate->line_buf.len] = '\0';
+ }
+ break;
case EOL_UNKNOWN:
/* shouldn't get here */
Assert(false);
@@ -1462,6 +1494,109 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+ bool read_entire_file = (cstate->opts.delim == NULL);
+ int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;
+
+ /*
+ * The objective of this loop is to transfer data into line_buf until we
+ * find the specified delimiter or reach EOF. In raw format, we treat the
+ * input data as-is, without any parsing, quoting, or escaping. We are
+ * only interested in locating the delimiter to determine the boundaries
+ * of each data value.
+ *
+ * If a delimiter is specified, we read data until we encounter the
+ * delimiter string. If no delimiter is specified, we read the entire
+ * input as a single data value. Unlike text or CSV modes, we do not need
+ * to handle line endings, escape sequences, or special characters.
+ *
+ * The input has already been converted to the database encoding. All
+ * supported server encodings have the property that all bytes in a
+ * multi-byte sequence have the high bit set, so a multibyte character
+ * cannot contain any newline or escape characters embedded in the
+ * multibyte sequence. Therefore, we can process the input byte-by-byte,
+ * regardless of the encoding.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * We handle both single-byte and multi-byte delimiters. For multi-byte
+ * delimiters, we ensure that we have enough data in the buffer to compare
+ * the delimiter string.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+
+ /* Load more data if needed */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* Update local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /* If no more data, break out of the loop */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* Fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+
+ if (read_entire_file)
+ {
+ /* Continue until EOF if reading entire file */
+ input_buf_ptr++;
+ continue;
+ }
+ else
+ {
+ /* Check for delimiter, possibly multi-byte */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+ if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim,
+ delim_len) == 0)
+ {
+ cstate->eol_type = EOL_CUSTOM;
+ input_buf_ptr += delim_len;
+ break;
+ }
+ input_buf_ptr++;
+ }
+ }
+
+ /* Transfer data to line_buf, including the delimiter if found */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1938,6 +2073,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34..bf967a366e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -191,7 +192,14 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -235,9 +243,18 @@ CopySendEndOfRow(CopyToState cstate)
}
break;
case COPY_FRONTEND:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
+ {
+ /* The FE/BE protocol uses \n as newline for all platforms */
CopySendChar(cstate, '\n');
+ }
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
@@ -574,6 +591,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -839,8 +863,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -921,7 +947,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -949,7 +976,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -969,6 +996,34 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Assert only one column is being copied */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1223,6 +1278,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af7..7f8d6f4f94 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "raw");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f..8996bc89e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
@@ -79,7 +80,7 @@ typedef struct CopyFormatOptions
char *null_print_client; /* same converted to file encoding */
char *default_print; /* DEFAULT marker string */
int default_print_len; /* length of same */
- char *delim; /* column delimiter (must be 1 byte) */
+ char *delim; /* delimiter (1 byte, except for raw format) */
char *quote; /* CSV quote char (must be 1 byte) */
char *escape; /* CSV escape char (must be 1 byte) */
List *force_quote; /* list of column names */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..b8693ae59e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -38,6 +38,7 @@ typedef enum EolType
EOL_NL,
EOL_CR,
EOL_CRNL,
+ EOL_CUSTOM,
} EolType;
/*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..2825d833ea 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,55 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+ col
+----------------------------------------
+ sharon 25 (15,12) 1000 sam +
+ sam 30 (10,5) 2000 bill +
+ bill 20 (11,10) 1000 sharon+
+
+(1 row)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+----------------------------------------
+ bill 20 (11,10) 1000 sharon
+ sam 30 (10,5) 2000 bill
+ sharon 25 (15,12) 1000 sam
+(3 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+sharon 25 (15,12) 1000 sam
+***
+sam 30 (10,5) 2000 bill
+***
+bill 20 (11,10) 1000 sharon
+***
+\qecho
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+--------
+
+ "def",
+ abc\.
+ ghi
+(4 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..f31bd6a322 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,15 +90,35 @@ COPY x from stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format RAW, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, header);
+ERROR: cannot specify HEADER in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x from stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
@@ -108,6 +128,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +140,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +886,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +959,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..93595037dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,27 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+\qecho
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
+\.
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..7aee4ca8ea 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,32 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
+COPY x from stdin (format RAW, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x from stdin (format RAW, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x from stdin (format RAW, quote 'x');
+COPY x from stdin (format RAW, header);
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x from stdin (format TEXT, force_quote(a));
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +650,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +722,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
--
2.45.1
v13-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v13-0003-Reorganize-option-validations.patch?="Download
From 830f92b2e82f1acccbb3136e687d2f7f7348c5a6 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:18:37 +0300
Subject: [PATCH 3/3] Reorganize option validations.
---
src/backend/commands/copy.c | 443 ++++++++++++++++++++----------------
1 file changed, 251 insertions(+), 192 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1d92836e68..fa831161cc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,39 +673,29 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+ /* --- DELIMITER option --- */
if (opts_out->delim)
{
- if (opts_out->format != COPY_FORMAT_RAW)
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
{
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
@@ -720,22 +710,53 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be newline or carriage return")));
}
+
+ if (opts_out->format == COPY_FORMAT_TEXT)
+ {
+ /*
+ * Disallow unsafe delimiter characters in text mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
}
- /* Set defaults for omitted options */
+ /* Set default delimiter */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->delim = ",";
else if (opts_out->format == COPY_FORMAT_TEXT)
opts_out->delim = "\t";
+ /* --- NULL option --- */
if (opts_out->null_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
+ /* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
-
}
+ /* Set default null_print */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->null_print = "";
else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -744,16 +765,23 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->format == COPY_FORMAT_CSV)
- {
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
- }
-
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -761,162 +789,202 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
- }
- /*
- * Disallow unsafe delimiter characters in text mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format == COPY_FORMAT_TEXT &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in RAW mode", "HEADER")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "DEFAULT",
+ "COPY TO")));
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
+ /* Don't allow the delimiter to appear in the default string. */
+ if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "DEFAULT")));
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
+ /* Don't allow the NULL and DEFAULT string to be the same */
+ if (opts_out->null_print_len == opts_out->default_print_len &&
+ strncmp(opts_out->null_print, opts_out->default_print,
+ opts_out->null_print_len) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("NULL specification and DEFAULT specification cannot be the same")));
+ }
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ /* --- HEADER option --- */
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
- if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- /* Don't allow the delimiter to appear in the null string. */
- if (opts_out->delim && opts_out->null_print &&
- strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
- if (opts_out->default_print)
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
{
- if (!is_from)
+ if (opts_out->format != COPY_FORMAT_CSV)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "DEFAULT",
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
"COPY TO")));
+ }
- /* Don't allow the delimiter to appear in the default string. */
- if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "DEFAULT")));
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
+ if (opts_out->default_print_len > 0 &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -924,28 +992,19 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("CSV quote character must not appear in the %s specification",
"DEFAULT")));
- /* Don't allow the NULL and DEFAULT string to be the same */
- if (opts_out->null_print_len == opts_out->default_print_len &&
- strncmp(opts_out->null_print, opts_out->default_print,
- opts_out->null_print_len) == 0)
+ if (opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("NULL specification and DEFAULT specification cannot be the same")));
- }
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
On Thu, Oct 24, 2024 at 2:30 PM Joel Jacobson <joel@compiler.org> wrote:
On Thu, Oct 24, 2024, at 03:54, Masahiko Sawada wrote:
I have one question:
From the 0001 patch's commit message:
No behavioral changes are intended; this is a pure refactoring to improve code
clarity and maintainability.Does the reorganization of the option validation done by this patch
also help make the 0002 patch simple or small?Thanks for the review. No, not much, except the changes necessary to
ProcessCopyOptions for raw, without also refactoring it, makes it
more complicated.If not much, while it
makes sense to me that introducing the CopyFormat enum is required by
the 0002 patch, I think we can discuss the reorganization part
separately. And I'd suggest the patch organization would be:0001: introduce CopyFormat and replace csv_mode and binary fields with it.
0002: add new 'raw' format.
0003: reorganize option validations.
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote ||
opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
maybe this has a code indentation issue.
since "if" and "opts_out" in the same column position.
It came to my mind,
change errmsg occurrence of "BINARY mode", "CSV mode" to "binary
format", "csv format" respectively.
I think "format" would be more accurate.
but the message seems invasive,
so i guess we need to use "mode".
overall v13-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
looks good to me.
On Mon, Oct 28, 2024, at 08:56, jian he wrote:
/* Check force_quote */ - if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all)) + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || + opts_out->force_quote_all)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */maybe this has a code indentation issue.
since "if" and "opts_out" in the same column position.
Thanks for review.
I've fixed the indentation issues.
It came to my mind,
change errmsg occurrence of "BINARY mode", "CSV mode" to "binary
format", "csv format" respectively.
I think "format" would be more accurate.
but the message seems invasive,
so i guess we need to use "mode".
That would work, I'm fine with either.
overall v13-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
looks good to me.
Cool.
/Joel
Attachments:
v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v14-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download
From 25dde7ac24b0dbe4922cf136f8d09d9381f79c43 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
fields with it.
---
src/backend/commands/copy.c | 48 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 ++++++++++----------
src/backend/commands/copyto.c | 20 ++++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 69 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..1532f59993 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,7 +822,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d8..51eb14d743 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..03c9d71d34 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..c3d1df267f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2..bb9fe00a6a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v14-0002-Add-raw-format-to-COPY-command.patchapplication/octet-stream; name="=?UTF-8?Q?v14-0002-Add-raw-format-to-COPY-command.patch?="Download
From 90913ddd6a19d6e0962055216a2560f96230ad25 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:16:31 +0300
Subject: [PATCH 2/3] Add raw format to COPY command.
This commit introduces a new raw format to the COPY command, enabling
efficient bulk data transfer of a single text column without any parsing,
quoting, or escaping. In raw format, data is copied exactly as it appears
in the file or table, adhering to the specified ENCODING option or the
current client encoding.
The raw format enforces a single column requirement, ensuring that exactly
one column is specified in the column list. Attempts to specify multiple
columns or omit the column list when the table has multiple columns will
result in an error. Additionally, the DELIMITER option in raw format accepts
any string, including multi-byte characters, providing greater flexibility
in defining data separators. If no DELIMITER is specified, the entire input
or output is treated as a single data value.
Furthermore, the raw format does not support format-specific options such as
NULL, HEADER, QUOTE, ESCAPE, FORCE_QUOTE, FORCE_NOT_NULL, and FORCE_NULL.
Using these options with the raw format will trigger errors, ensuring that
data remains unaltered during the transfer process.
This enhancement is particularly useful when handling text blobs, JSON files,
or other text-based formats where preserving the data "as is" is crucial.
---
doc/src/sgml/ref/copy.sgml | 134 ++++++++++++++--
src/backend/commands/copy.c | 95 ++++++++----
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 188 ++++++++++++++++++++++-
src/backend/commands/copyto.c | 89 ++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 1 +
src/test/regress/expected/copy.out | 52 +++++++
src/test/regress/expected/copy2.out | 52 ++++++-
src/test/regress/sql/copy.sql | 24 +++
src/test/regress/sql/copy2.sql | 37 ++++-
12 files changed, 621 insertions(+), 63 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..f17d606537 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -253,11 +254,27 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>DELIMITER</literal></term>
<listitem>
<para>
- Specifies the character that separates columns within each row
- (line) of the file. The default is a tab character in text format,
- a comma in <literal>CSV</literal> format.
- This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ Specifies the delimiter used in the file. Its usage depends on the
+ <literal>FORMAT</literal> specified:
+ <simplelist>
+ <member>
+ In <literal>text</literal> and <literal>CSV</literal> formats,
+ the delimiter separates <emphasis>columns</emphasis> within each row
+ (line) of the file.
+ The default is a tab character in <literal>text</literal> format and
+ a comma in <literal>CSV</literal> format. This must be a single
+ one-byte character.
+ </member>
+ <member>
+ In <literal>raw</literal> format, the delimiter separates
+ <emphasis>rows</emphasis> in the file. The default is no delimiter,
+ which means that for <command>COPY FROM</command>, the entire input is
+ read as a single field, and for <command>COPY TO</command>, the output
+ is concatenated without any delimiter. If a delimiter is specified,
+ it can be a multi-byte string; for example, <literal>E'\r\n'</literal>
+ can be used when dealing with text files on Windows platforms.
+ </member>
+ </simplelist>
</para>
</listitem>
</varlistentry>
@@ -271,7 +288,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +312,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -310,7 +328,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
The <literal>MATCH</literal> option is only valid for <command>COPY
FROM</command> commands.
</para>
@@ -400,7 +419,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +913,98 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2 id="sql-copy-raw-format" xreflabel="Raw Format">
+ <title>Raw Format</title>
+
+ <para>
+ The <literal>raw</literal> format is designed for efficient bulk data
+ transfer of a single text column without any parsing, quoting, or
+ escaping. In this format, data is copied exactly as it appears in the file
+ or table, interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding.
+ </para>
+
+ <para>
+ When using the <literal>raw</literal> format, each data value corresponds
+ to a single field with no additional formatting or processing. The
+ <literal>DELIMITER</literal> option specifies the string that separates
+ data values. Unlike in other formats, the delimiter in
+ <literal>raw</literal> format can be any string, including multi-byte
+ characters. If no <literal>DELIMITER</literal> is specified, the entire
+ input or output is treated as a single data value.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format requires that exactly one column be
+ specified in the column list. An error is raised if more than one column
+ is specified or if no column list is specified when the table has multiple
+ columns.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not support any of the
+ format-specific options of other formats, such as <literal>NULL</literal>,
+ <literal>HEADER</literal>, <literal>QUOTE</literal>,
+ <literal>ESCAPE</literal>, <literal>FORCE_QUOTE</literal>,
+ <literal>FORCE_NOT_NULL</literal>, and <literal>FORCE_NULL</literal>.
+ Attempting to use these options with <literal>raw</literal> format will
+ result in an error.
+ </para>
+
+ <para>
+ Since the <literal>raw</literal> format deals with text, the data is
+ interpreted according to the specified <literal>ENCODING</literal> option
+ or the current client encoding for input, and encoded using the specified
+ <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <note>
+ <para>
+ Empty lines in the input are treated as empty strings, not as
+ <literal>NULL</literal> values. There is no way to represent a
+ <literal>NULL</literal> value in <literal>raw</literal> format.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format is particularly useful when you need to
+ import or export data exactly as it appears. This can be
+ helpful when dealing with large text blobs, JSON files, or other
+ text-based formats.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format can only be used when copying exactly
+ one column. If the table has multiple columns, you must specify the
+ column list containing only one column.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ Unlike other formats, the delimiter in <literal>raw</literal> format can
+ be any string, and there are no restrictions on the characters used in
+ the delimiter, including newline or carriage return characters.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ When using <literal>COPY TO</literal> with <literal>raw</literal> format
+ and a specified <literal>DELIMITER</literal>, there is no check to prevent
+ data values from containing the delimiter string, which could be
+ problematic if it would be needed to import the data preserved using
+ <literal>COPY FROM</literal>, since a data value containing the delimiter
+ would then be split into two values. If this is a concern, a different
+ format should be used instead.
+ </para>
+ </note>
+ </refsect2>
+
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1532f59993..fa10bbccb2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -686,18 +688,61 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ if (opts_out->delim)
+ {
+ if (opts_out->format != COPY_FORMAT_RAW)
+ {
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+ }
+ }
/* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ if (opts_out->null_print)
+ {
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
if (opts_out->format == COPY_FORMAT_CSV)
{
@@ -707,25 +752,6 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +764,7 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * Disallow unsafe delimiter characters in text mode. We can't allow
* backslash because it would be ambiguous. We can't allow the other
* cases because data characters matching the delimiter must be
* backslashed, and certain backslash combinations are interpreted
@@ -747,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (opts_out->format != COPY_FORMAT_CSV &&
+ if (opts_out->format == COPY_FORMAT_TEXT &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -761,6 +787,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
@@ -792,7 +824,7 @@ ProcessCopyOptions(ParseState *pstate,
/* Check force_quote */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -823,7 +855,7 @@ ProcessCopyOptions(ParseState *pstate,
/* Check force_null */
if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -839,11 +871,12 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && opts_out->null_print &&
+ strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
+ errmsg("COPY delimiter character must not appear in the %s specification",
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
@@ -875,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the default string. */
- if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..99dcb00f8a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 51eb14d743..30938677de 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -142,8 +142,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -731,7 +733,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -747,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -767,8 +769,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -822,8 +829,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1095,7 +1109,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1146,6 +1163,21 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf.len -= 2;
cstate->line_buf.data[cstate->line_buf.len] = '\0';
break;
+ case EOL_CUSTOM:
+ {
+ int delim_len;
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+ Assert(cstate->opts.delim);
+ delim_len = strlen(cstate->opts.delim);
+ Assert(delim_len > 0);
+ Assert(cstate->line_buf.len >= delim_len);
+ Assert(memcmp(cstate->line_buf.data + cstate->line_buf.len - delim_len,
+ cstate->opts.delim,
+ delim_len) == 0);
+ cstate->line_buf.len -= delim_len;
+ cstate->line_buf.data[cstate->line_buf.len] = '\0';
+ }
+ break;
case EOL_UNKNOWN:
/* shouldn't get here */
Assert(false);
@@ -1461,6 +1493,109 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+ bool read_entire_file = (cstate->opts.delim == NULL);
+ int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;
+
+ /*
+ * The objective of this loop is to transfer data into line_buf until we
+ * find the specified delimiter or reach EOF. In raw format, we treat the
+ * input data as-is, without any parsing, quoting, or escaping. We are
+ * only interested in locating the delimiter to determine the boundaries
+ * of each data value.
+ *
+ * If a delimiter is specified, we read data until we encounter the
+ * delimiter string. If no delimiter is specified, we read the entire
+ * input as a single data value. Unlike text or CSV modes, we do not need
+ * to handle line endings, escape sequences, or special characters.
+ *
+ * The input has already been converted to the database encoding. All
+ * supported server encodings have the property that all bytes in a
+ * multi-byte sequence have the high bit set, so a multibyte character
+ * cannot contain any newline or escape characters embedded in the
+ * multibyte sequence. Therefore, we can process the input byte-by-byte,
+ * regardless of the encoding.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * We handle both single-byte and multi-byte delimiters. For multi-byte
+ * delimiters, we ensure that we have enough data in the buffer to compare
+ * the delimiter string.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+
+ /* Load more data if needed */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* Update local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /* If no more data, break out of the loop */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* Fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+
+ if (read_entire_file)
+ {
+ /* Continue until EOF if reading entire file */
+ input_buf_ptr++;
+ continue;
+ }
+ else
+ {
+ /* Check for delimiter, possibly multi-byte */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+ if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim,
+ delim_len) == 0)
+ {
+ cstate->eol_type = EOL_CUSTOM;
+ input_buf_ptr += delim_len;
+ break;
+ }
+ input_buf_ptr++;
+ }
+ }
+
+ /* Transfer data to line_buf, including the delimiter if found */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1937,6 +2072,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input
+ * data line, so we can just force attribute_buf to be large enough and
+ * then transfer data without any checks for enough space. We need to do
+ * it this way because enlarging attribute_buf mid-stream would invalidate
+ * pointers already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34..bf967a366e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -191,7 +192,14 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -235,9 +243,18 @@ CopySendEndOfRow(CopyToState cstate)
}
break;
case COPY_FRONTEND:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
+ {
+ /* The FE/BE protocol uses \n as newline for all platforms */
CopySendChar(cstate, '\n');
+ }
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
@@ -574,6 +591,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -839,8 +863,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -921,7 +947,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -949,7 +976,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -969,6 +996,34 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Assert only one column is being copied */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1223,6 +1278,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af7..7f8d6f4f94 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "raw");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f..8996bc89e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
@@ -79,7 +80,7 @@ typedef struct CopyFormatOptions
char *null_print_client; /* same converted to file encoding */
char *default_print; /* DEFAULT marker string */
int default_print_len; /* length of same */
- char *delim; /* column delimiter (must be 1 byte) */
+ char *delim; /* delimiter (1 byte, except for raw format) */
char *quote; /* CSV quote char (must be 1 byte) */
char *escape; /* CSV escape char (must be 1 byte) */
List *force_quote; /* list of column names */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..b8693ae59e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -38,6 +38,7 @@ typedef enum EolType
EOL_NL,
EOL_CR,
EOL_CRNL,
+ EOL_CUSTOM,
} EolType;
/*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..2825d833ea 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,55 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+ col
+----------------------------------------
+ sharon 25 (15,12) 1000 sam +
+ sam 30 (10,5) 2000 bill +
+ bill 20 (11,10) 1000 sharon+
+
+(1 row)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+----------------------------------------
+ bill 20 (11,10) 1000 sharon
+ sam 30 (10,5) 2000 bill
+ sharon 25 (15,12) 1000 sam
+(3 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+sharon 25 (15,12) 1000 sam
+***
+sam 30 (10,5) 2000 bill
+***
+bill 20 (11,10) 1000 sharon
+***
+\qecho
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+--------
+
+ "def",
+ abc\.
+ ghi
+(4 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..f31bd6a322 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,15 +90,35 @@ COPY x from stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format RAW, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, header);
+ERROR: cannot specify HEADER in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x from stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
@@ -108,6 +128,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +140,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +886,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +959,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..93595037dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,27 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+\qecho
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
+\.
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..7aee4ca8ea 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,32 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
+COPY x from stdin (format RAW, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x from stdin (format RAW, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x from stdin (format RAW, quote 'x');
+COPY x from stdin (format RAW, header);
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x from stdin (format TEXT, force_quote(a));
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +650,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +722,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
--
2.45.1
v14-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v14-0003-Reorganize-option-validations.patch?="Download
From c156d97d2efea88f955a7a947092c4722fb2b25f Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:18:37 +0300
Subject: [PATCH 3/3] Reorganize option validations.
---
src/backend/commands/copy.c | 463 ++++++++++++++++++++----------------
1 file changed, 261 insertions(+), 202 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fa10bbccb2..fa831161cc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,39 +673,29 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "DEFAULT")));
-
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
+ /* --- DELIMITER option --- */
if (opts_out->delim)
{
- if (opts_out->format != COPY_FORMAT_RAW)
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
{
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
@@ -720,22 +710,53 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be newline or carriage return")));
}
+
+ if (opts_out->format == COPY_FORMAT_TEXT)
+ {
+ /*
+ * Disallow unsafe delimiter characters in text mode. We can't allow
+ * backslash because it would be ambiguous. We can't allow the other
+ * cases because data characters matching the delimiter must be
+ * backslashed, and certain backslash combinations are interpreted
+ * non-literally by COPY IN. Disallowing all lower case ASCII letters is
+ * more than strictly necessary, but seems best for consistency and
+ * future-proofing. Likewise we disallow all digits though only octal
+ * digits are actually dangerous.
+ */
+ if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
}
- /* Set defaults for omitted options */
+ /* Set default delimiter */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->delim = ",";
else if (opts_out->format == COPY_FORMAT_TEXT)
opts_out->delim = "\t";
+ /* --- NULL option --- */
if (opts_out->null_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
+ /* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
-
}
+ /* Set default null_print */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->null_print = "";
else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -744,16 +765,23 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->format == COPY_FORMAT_CSV)
- {
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
- }
-
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -761,144 +789,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
- }
- /*
- * Disallow unsafe delimiter characters in text mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format == COPY_FORMAT_TEXT &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in RAW mode", "HEADER")));
-
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
-
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the null string. */
- if (opts_out->delim && opts_out->null_print &&
- strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
-
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
-
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -908,22 +799,13 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the default string. */
- if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. NULL */
errmsg("COPY delimiter character must not appear in the %s specification",
"DEFAULT")));
- /* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "DEFAULT")));
-
/* Don't allow the NULL and DEFAULT string to be the same */
if (opts_out->null_print_len == opts_out->default_print_len &&
strncmp(opts_out->null_print, opts_out->default_print,
@@ -932,20 +814,197 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
+
+ /* --- HEADER option --- */
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
+
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the CSV quote char to appear in the default string. */
+ if (opts_out->default_print_len > 0 &&
+ strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "DEFAULT")));
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
On Mon, Oct 28, 2024, at 10:30, Joel Jacobson wrote:
On Mon, Oct 28, 2024, at 08:56, jian he wrote:
/* Check force_quote */ - if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all)) + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || + opts_out->force_quote_all)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */maybe this has a code indentation issue.
since "if" and "opts_out" in the same column position.Thanks for review.
I've fixed the indentation issues.
I've now installed pgindent, and will use it from hereon, to avoid this class of problems.
New version where all three patches are now indented using pgindent.
/Joel
Attachments:
v15-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v15-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download
From 151ffee12d4a44602baf5a29f5e25a21173ce7af Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
fields with it.
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 +++++++++----------
src/backend/commands/copyto.c | 20 +++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 70 insertions(+), 58 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..b7e819de40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,8 +822,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d8..51eb14d743 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..03c9d71d34 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..c3d1df267f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2..bb9fe00a6a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v15-0002-Add-raw-format-to-COPY-command.patchapplication/octet-stream; name="=?UTF-8?Q?v15-0002-Add-raw-format-to-COPY-command.patch?="Download
From a34a82595dcdb173ccc3473bce06b8a9008acd21 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:16:31 +0300
Subject: [PATCH 2/3] Add raw format to COPY command.
This commit introduces a new raw format to the COPY command, enabling
efficient bulk data transfer of a single text column without any parsing,
quoting, or escaping. In raw format, data is copied exactly as it appears
in the file or table, adhering to the specified ENCODING option or the
current client encoding.
The raw format enforces a single column requirement, ensuring that exactly
one column is specified in the column list. Attempts to specify multiple
columns or omit the column list when the table has multiple columns will
result in an error. Additionally, the DELIMITER option in raw format accepts
any string, including multi-byte characters, providing greater flexibility
in defining data separators. If no DELIMITER is specified, the entire input
or output is treated as a single data value.
Furthermore, the raw format does not support format-specific options such as
NULL, HEADER, QUOTE, ESCAPE, FORCE_QUOTE, FORCE_NOT_NULL, and FORCE_NULL.
Using these options with the raw format will trigger errors, ensuring that
data remains unaltered during the transfer process.
This enhancement is particularly useful when handling text blobs, JSON files,
or other text-based formats where preserving the data "as is" is crucial.
---
doc/src/sgml/ref/copy.sgml | 134 ++++++++++++++--
src/backend/commands/copy.c | 89 +++++++----
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 189 ++++++++++++++++++++++-
src/backend/commands/copyto.c | 90 ++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 1 +
src/test/regress/expected/copy.out | 52 +++++++
src/test/regress/expected/copy2.out | 52 ++++++-
src/test/regress/sql/copy.sql | 24 +++
src/test/regress/sql/copy2.sql | 37 ++++-
12 files changed, 620 insertions(+), 60 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..f17d606537 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -253,11 +254,27 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>DELIMITER</literal></term>
<listitem>
<para>
- Specifies the character that separates columns within each row
- (line) of the file. The default is a tab character in text format,
- a comma in <literal>CSV</literal> format.
- This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ Specifies the delimiter used in the file. Its usage depends on the
+ <literal>FORMAT</literal> specified:
+ <simplelist>
+ <member>
+ In <literal>text</literal> and <literal>CSV</literal> formats,
+ the delimiter separates <emphasis>columns</emphasis> within each row
+ (line) of the file.
+ The default is a tab character in <literal>text</literal> format and
+ a comma in <literal>CSV</literal> format. This must be a single
+ one-byte character.
+ </member>
+ <member>
+ In <literal>raw</literal> format, the delimiter separates
+ <emphasis>rows</emphasis> in the file. The default is no delimiter,
+ which means that for <command>COPY FROM</command>, the entire input is
+ read as a single field, and for <command>COPY TO</command>, the output
+ is concatenated without any delimiter. If a delimiter is specified,
+ it can be a multi-byte string; for example, <literal>E'\r\n'</literal>
+ can be used when dealing with text files on Windows platforms.
+ </member>
+ </simplelist>
</para>
</listitem>
</varlistentry>
@@ -271,7 +288,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +312,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -310,7 +328,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
The <literal>MATCH</literal> option is only valid for <command>COPY
FROM</command> commands.
</para>
@@ -400,7 +419,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +913,98 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2 id="sql-copy-raw-format" xreflabel="Raw Format">
+ <title>Raw Format</title>
+
+ <para>
+ The <literal>raw</literal> format is designed for efficient bulk data
+ transfer of a single text column without any parsing, quoting, or
+ escaping. In this format, data is copied exactly as it appears in the file
+ or table, interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding.
+ </para>
+
+ <para>
+ When using the <literal>raw</literal> format, each data value corresponds
+ to a single field with no additional formatting or processing. The
+ <literal>DELIMITER</literal> option specifies the string that separates
+ data values. Unlike in other formats, the delimiter in
+ <literal>raw</literal> format can be any string, including multi-byte
+ characters. If no <literal>DELIMITER</literal> is specified, the entire
+ input or output is treated as a single data value.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format requires that exactly one column be
+ specified in the column list. An error is raised if more than one column
+ is specified or if no column list is specified when the table has multiple
+ columns.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not support any of the
+ format-specific options of other formats, such as <literal>NULL</literal>,
+ <literal>HEADER</literal>, <literal>QUOTE</literal>,
+ <literal>ESCAPE</literal>, <literal>FORCE_QUOTE</literal>,
+ <literal>FORCE_NOT_NULL</literal>, and <literal>FORCE_NULL</literal>.
+ Attempting to use these options with <literal>raw</literal> format will
+ result in an error.
+ </para>
+
+ <para>
+ Since the <literal>raw</literal> format deals with text, the data is
+ interpreted according to the specified <literal>ENCODING</literal> option
+ or the current client encoding for input, and encoded using the specified
+ <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <note>
+ <para>
+ Empty lines in the input are treated as empty strings, not as
+ <literal>NULL</literal> values. There is no way to represent a
+ <literal>NULL</literal> value in <literal>raw</literal> format.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format is particularly useful when you need to
+ import or export data exactly as it appears. This can be
+ helpful when dealing with large text blobs, JSON files, or other
+ text-based formats.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format can only be used when copying exactly
+ one column. If the table has multiple columns, you must specify the
+ column list containing only one column.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ Unlike other formats, the delimiter in <literal>raw</literal> format can
+ be any string, and there are no restrictions on the characters used in
+ the delimiter, including newline or carriage return characters.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ When using <literal>COPY TO</literal> with <literal>raw</literal> format
+ and a specified <literal>DELIMITER</literal>, there is no check to prevent
+ data values from containing the delimiter string, which could be
+ problematic if it would be needed to import the data preserved using
+ <literal>COPY FROM</literal>, since a data value containing the delimiter
+ would then be split into two values. If this is a concern, a different
+ format should be used instead.
+ </para>
+ </note>
+ </refsect2>
+
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b7e819de40..bb3b106ff1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -686,18 +688,61 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ if (opts_out->delim)
+ {
+ if (opts_out->format != COPY_FORMAT_RAW)
+ {
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+ }
+ }
/* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ if (opts_out->null_print)
+ {
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
if (opts_out->format == COPY_FORMAT_CSV)
{
@@ -707,25 +752,6 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +764,7 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * Disallow unsafe delimiter characters in text mode. We can't allow
* backslash because it would be ambiguous. We can't allow the other
* cases because data characters matching the delimiter must be
* backslashed, and certain backslash combinations are interpreted
@@ -747,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (opts_out->format != COPY_FORMAT_CSV &&
+ if (opts_out->format == COPY_FORMAT_TEXT &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -761,6 +787,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
@@ -839,7 +871,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && opts_out->null_print &&
+ strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
@@ -875,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the default string. */
- if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..73a3f38d90 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 51eb14d743..46395b5bdb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -142,8 +142,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
+static int CopyReadAttributesRaw(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
Oid typioparam, int32 typmod,
bool *isnull);
@@ -731,7 +733,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -747,7 +749,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -767,8 +769,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
if (fldct != list_length(cstate->attnumlist))
ereport(ERROR,
@@ -822,8 +829,15 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ fldct = CopyReadAttributesRaw(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1095,7 +1109,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1146,6 +1163,22 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf.len -= 2;
cstate->line_buf.data[cstate->line_buf.len] = '\0';
break;
+ case EOL_CUSTOM:
+ {
+ int delim_len;
+
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+ Assert(cstate->opts.delim);
+ delim_len = strlen(cstate->opts.delim);
+ Assert(delim_len > 0);
+ Assert(cstate->line_buf.len >= delim_len);
+ Assert(memcmp(cstate->line_buf.data + cstate->line_buf.len - delim_len,
+ cstate->opts.delim,
+ delim_len) == 0);
+ cstate->line_buf.len -= delim_len;
+ cstate->line_buf.data[cstate->line_buf.len] = '\0';
+ }
+ break;
case EOL_UNKNOWN:
/* shouldn't get here */
Assert(false);
@@ -1461,6 +1494,109 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+ bool read_entire_file = (cstate->opts.delim == NULL);
+ int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;
+
+ /*
+ * The objective of this loop is to transfer data into line_buf until we
+ * find the specified delimiter or reach EOF. In raw format, we treat the
+ * input data as-is, without any parsing, quoting, or escaping. We are
+ * only interested in locating the delimiter to determine the boundaries
+ * of each data value.
+ *
+ * If a delimiter is specified, we read data until we encounter the
+ * delimiter string. If no delimiter is specified, we read the entire
+ * input as a single data value. Unlike text or CSV modes, we do not need
+ * to handle line endings, escape sequences, or special characters.
+ *
+ * The input has already been converted to the database encoding. All
+ * supported server encodings have the property that all bytes in a
+ * multi-byte sequence have the high bit set, so a multibyte character
+ * cannot contain any newline or escape characters embedded in the
+ * multibyte sequence. Therefore, we can process the input byte-by-byte,
+ * regardless of the encoding.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * We handle both single-byte and multi-byte delimiters. For multi-byte
+ * delimiters, we ensure that we have enough data in the buffer to compare
+ * the delimiter string.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+
+ /* Load more data if needed */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* Update local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /* If no more data, break out of the loop */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* Fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+
+ if (read_entire_file)
+ {
+ /* Continue until EOF if reading entire file */
+ input_buf_ptr++;
+ continue;
+ }
+ else
+ {
+ /* Check for delimiter, possibly multi-byte */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+ if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim,
+ delim_len) == 0)
+ {
+ cstate->eol_type = EOL_CUSTOM;
+ input_buf_ptr += delim_len;
+ break;
+ }
+ input_buf_ptr++;
+ }
+ }
+
+ /* Transfer data to line_buf, including the delimiter if found */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1937,6 +2073,45 @@ endfield:
return fieldno;
}
+/*
+ * Parse the current line as a single attribute for the "raw" COPY format.
+ * No parsing, quoting, or escaping is performed.
+ * Empty lines are treated as empty strings, not NULL.
+ */
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+ }
+
+ resetStringInfo(&cstate->attribute_buf);
+
+ /*
+ * The attribute will certainly not be longer than the input data line, so
+ * we can just force attribute_buf to be large enough and then transfer
+ * data without any checks for enough space. We need to do it this way
+ * because enlarging attribute_buf mid-stream would invalidate pointers
+ * already stored into cstate->raw_fields[].
+ */
+ if (cstate->attribute_buf.maxlen <= cstate->line_buf.len)
+ enlargeStringInfo(&cstate->attribute_buf, cstate->line_buf.len);
+
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
+
+ /* Assign the single field to raw_fields[0] */
+ cstate->raw_fields[0] = cstate->attribute_buf.data;
+
+ return 1;
+}
/*
* Read a binary attribute
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34..2611c0a360 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -191,7 +192,14 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -235,9 +243,18 @@ CopySendEndOfRow(CopyToState cstate)
}
break;
case COPY_FRONTEND:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
+ {
+ /* The FE/BE protocol uses \n as newline for all platforms */
CopySendChar(cstate, '\n');
+ }
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
@@ -574,6 +591,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -839,8 +863,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -921,7 +947,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -949,7 +976,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -969,6 +996,35 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Assert only one column is being copied */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1223,6 +1279,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af7..7f8d6f4f94 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "raw");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f..8996bc89e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
@@ -79,7 +80,7 @@ typedef struct CopyFormatOptions
char *null_print_client; /* same converted to file encoding */
char *default_print; /* DEFAULT marker string */
int default_print_len; /* length of same */
- char *delim; /* column delimiter (must be 1 byte) */
+ char *delim; /* delimiter (1 byte, except for raw format) */
char *quote; /* CSV quote char (must be 1 byte) */
char *escape; /* CSV escape char (must be 1 byte) */
List *force_quote; /* list of column names */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..b8693ae59e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -38,6 +38,7 @@ typedef enum EolType
EOL_NL,
EOL_CR,
EOL_CRNL,
+ EOL_CUSTOM,
} EolType;
/*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..2825d833ea 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,55 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+ col
+----------------------------------------
+ sharon 25 (15,12) 1000 sam +
+ sam 30 (10,5) 2000 bill +
+ bill 20 (11,10) 1000 sharon+
+
+(1 row)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+----------------------------------------
+ bill 20 (11,10) 1000 sharon
+ sam 30 (10,5) 2000 bill
+ sharon 25 (15,12) 1000 sam
+(3 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+sharon 25 (15,12) 1000 sam
+***
+sam 30 (10,5) 2000 bill
+***
+bill 20 (11,10) 1000 sharon
+***
+\qecho
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+--------
+
+ "def",
+ abc\.
+ ghi
+(4 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..f31bd6a322 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,15 +90,35 @@ COPY x from stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format RAW, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, header);
+ERROR: cannot specify HEADER in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x from stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
@@ -108,6 +128,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +140,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +886,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +959,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..93595037dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,27 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+\qecho
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
+\.
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..7aee4ca8ea 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,32 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
+COPY x from stdin (format RAW, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x from stdin (format RAW, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x from stdin (format RAW, quote 'x');
+COPY x from stdin (format RAW, header);
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x from stdin (format TEXT, force_quote(a));
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +650,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +722,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
--
2.45.1
v15-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v15-0003-Reorganize-option-validations.patch?="Download
From ba0e4da0d85c3929f75441525c89624446b436e7 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:18:37 +0300
Subject: [PATCH 3/3] Reorganize option validations.
---
src/backend/commands/copy.c | 463 ++++++++++++++++++++----------------
1 file changed, 261 insertions(+), 202 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index bb3b106ff1..0ef8ed501b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,39 +673,29 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "DEFAULT")));
-
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
+ /* --- DELIMITER option --- */
if (opts_out->delim)
{
- if (opts_out->format != COPY_FORMAT_RAW)
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
{
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
@@ -720,22 +710,53 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be newline or carriage return")));
}
+
+ if (opts_out->format == COPY_FORMAT_TEXT)
+ {
+ /*
+ * Disallow unsafe delimiter characters in text mode. We can't
+ * allow backslash because it would be ambiguous. We can't allow
+ * the other cases because data characters matching the delimiter
+ * must be backslashed, and certain backslash combinations are
+ * interpreted non-literally by COPY IN. Disallowing all lower
+ * case ASCII letters is more than strictly necessary, but seems
+ * best for consistency and future-proofing. Likewise we disallow
+ * all digits though only octal digits are actually dangerous.
+ */
+ if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
}
- /* Set defaults for omitted options */
+ /* Set default delimiter */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->delim = ",";
else if (opts_out->format == COPY_FORMAT_TEXT)
opts_out->delim = "\t";
+ /* --- NULL option --- */
if (opts_out->null_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
+ /* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
-
}
+ /* Set default null_print */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->null_print = "";
else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -744,16 +765,23 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->format == COPY_FORMAT_CSV)
- {
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
- }
-
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -761,144 +789,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
- }
- /*
- * Disallow unsafe delimiter characters in text mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format == COPY_FORMAT_TEXT &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in RAW mode", "HEADER")));
-
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
-
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the null string. */
- if (opts_out->delim && opts_out->null_print &&
- strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
-
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
-
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -908,22 +799,13 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the default string. */
- if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. NULL */
errmsg("COPY delimiter character must not appear in the %s specification",
"DEFAULT")));
- /* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "DEFAULT")));
-
/* Don't allow the NULL and DEFAULT string to be the same */
if (opts_out->null_print_len == opts_out->default_print_len &&
strncmp(opts_out->null_print, opts_out->default_print,
@@ -932,20 +814,197 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
+
+ /* --- HEADER option --- */
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
+
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the CSV quote char to appear in the default string. */
+ if (opts_out->default_print_len > 0 &&
+ strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "DEFAULT")));
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
On Mon, Oct 28, 2024 at 3:21 AM Joel Jacobson <joel@compiler.org> wrote:
On Mon, Oct 28, 2024, at 10:30, Joel Jacobson wrote:
On Mon, Oct 28, 2024, at 08:56, jian he wrote:
/* Check force_quote */ - if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all)) + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || + opts_out->force_quote_all)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */maybe this has a code indentation issue.
since "if" and "opts_out" in the same column position.Thanks for review.
I've fixed the indentation issues.
I've now installed pgindent, and will use it from hereon, to avoid this class of problems.
New version where all three patches are now indented using pgindent.
Thank you for updating the patch. Here are review comments on the v15
0002 patch:
When testing the patch with an empty delimiter, I got the following failure:
postgres(1:903898)=# copy hoge from '/tmp/tmp.raw' with (format 'raw',
delimiter '');
TRAP: failed Assert("delim_len > 0"), File: "copyfromparse.c", Line:
1173, PID: 903898
---
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else
+ {
+ elog(ERROR, "unexpected COPY format: %d", cstate->opts.format);
+ pg_unreachable();
+ }
Since we already check the incompatible options with COPY_FORMAT_RAW
and default_print, I think it's better to add an assertion to make
sure the format is either COPY_FORMAT_CSV or COPY_FORMAT_TEXT, instead
of using elog(ERROR).
---
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
This function has a lot of duplication with CopyReadLineText(). I
think it's better to modify CopyReadLineText() to support 'raw'
format, rather than adding a separate function.
---
+ bool read_entire_file = (cstate->opts.delim == NULL);
+ int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;
I think we can use 'delim_len == 0' instead of read_entire_file.
---
+ if (read_entire_file)
+ {
+ /* Continue until EOF if reading entire file */
+ input_buf_ptr++;
+ continue;
+ }
In the case where we're reading the entire file as a single tuple, we
don't need to advance the input_buf_ptr one by one. Instead,
input_buf_ptr can jump to copy_buf_len, which is faster.
---
+ /* Check for delimiter, possibly multi-byte */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+ if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim,
+ delim_len) == 0)
+ {
+ cstate->eol_type = EOL_CUSTOM;
+ input_buf_ptr += delim_len;
+ break;
+ }
+ input_buf_ptr++;
Similar to the above comment, I think we don't need to check the char
one by one. I guess that it would be faster if we locate the delimiter
string in the intput_buf (e.g. using strstr()), and then move
input_buf_ptr to the detected position.
---
+ /* Copy the entire line into attribute_buf */
+ memcpy(cstate->attribute_buf.data, cstate->line_buf.data,
+ cstate->line_buf.len);
+ cstate->attribute_buf.data[cstate->line_buf.len] = '\0';
+ cstate->attribute_buf.len = cstate->line_buf.len;
The CopyReadAttributesRaw() just copies line_buf data to
attirbute_buf, which seems to be a waste. I think we can have
attribute_buf point to the line_buf. That way, we can skip the whole
step 4 that is described in the comment on top o f copyfromparse.c:
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
---
+static int
+CopyReadAttributesRaw(CopyFromState cstate)
+{
+ /* Enforce single column requirement */
+ if (cstate->max_fields != 1)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly
one column")));
+ }
This check should have already been done in BeginCopyFrom(). Is there
any case where max_fields gets to != 1 during reading the input?
---
It's a bit odd to me to use the delimiter as a EOL marker in raw
format, but probably it's okay.
---
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
Since it sends the delimiter as a string, even if we specify the
delimiter to '\n', it doesn't send the new line (i.e. ASCII LF, 10).
For example,
postgres(1:904427)=# copy (select '{"j" : 1}'::jsonb) to stdout with
(format 'raw', delimiter '\n');
{"j": 1}\npostgres(1:904427)=#
I think there is a similar problem in COPY FROM; if we set a delimiter
to '\n' when doing COPY FROM in raw format, it expects the string '\n'
as a line termination but not ASCII LF(10). I think that input data
normally doesn't use the string '\n' as a line termination.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Mon, Oct 28, 2024, at 18:50, Masahiko Sawada wrote:
Thank you for updating the patch. Here are review comments on the v15
0002 patch:
Thanks for review.
When testing the patch with an empty delimiter, I got the following failure:
postgres(1:903898)=# copy hoge from '/tmp/tmp.raw' with (format 'raw',
delimiter '');
TRAP: failed Assert("delim_len > 0"), File: "copyfromparse.c", Line:
1173, PID: 903898
Fixed.
--- - else + else if (cstate->opts.format == COPY_FORMAT_TEXT) fldct = CopyReadAttributesText(cstate); + else + { + elog(ERROR, "unexpected COPY format: %d", cstate->opts.format); + pg_unreachable(); + }Since we already check the incompatible options with COPY_FORMAT_RAW
and default_print, I think it's better to add an assertion to make
sure the format is either COPY_FORMAT_CSV or COPY_FORMAT_TEXT, instead
of using elog(ERROR).
I agree, fixed.
--- +/* + * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode + */ +static bool +CopyReadLineRawText(CopyFromState cstate)This function has a lot of duplication with CopyReadLineText(). I
think it's better to modify CopyReadLineText() to support 'raw'
format, rather than adding a separate function.
Hmm, there is a bit of duplication, yes, but is also a hot-path,
so I think we want to minimize branches and code size in the
hot loop.
Combining them into one function, would mean the total function
size and branching increases for both cases.
I haven't made any benchmarks on this though.
--- + bool read_entire_file = (cstate->opts.delim == NULL); + int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;I think we can use 'delim_len == 0' instead of read_entire_file.
Fixed.
--- + if (read_entire_file) + { + /* Continue until EOF if reading entire file */ + input_buf_ptr++; + continue; + }In the case where we're reading the entire file as a single tuple, we
don't need to advance the input_buf_ptr one by one. Instead,
input_buf_ptr can jump to copy_buf_len, which is faster.
Fixed.
--- + /* Check for delimiter, possibly multi-byte */ + IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1); + if (strncmp(©_input_buf[input_buf_ptr], cstate->opts.delim, + delim_len) == 0) + { + cstate->eol_type = EOL_CUSTOM; + input_buf_ptr += delim_len; + break; + } + input_buf_ptr++;Similar to the above comment, I think we don't need to check the char
one by one. I guess that it would be faster if we locate the delimiter
string in the intput_buf (e.g. using strstr()), and then move
input_buf_ptr to the detected position.
Fixed.
--- + /* Copy the entire line into attribute_buf */ + memcpy(cstate->attribute_buf.data, cstate->line_buf.data, + cstate->line_buf.len); + cstate->attribute_buf.data[cstate->line_buf.len] = '\0'; + cstate->attribute_buf.len = cstate->line_buf.len;The CopyReadAttributesRaw() just copies line_buf data to
attirbute_buf, which seems to be a waste. I think we can have
attribute_buf point to the line_buf. That way, we can skip the whole
step 4 that is described in the comment on top o f copyfromparse.c:* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
Fixed. I've removed CopyReadAttributesRaw() entirely.
--- +static int +CopyReadAttributesRaw(CopyFromState cstate) +{ + /* Enforce single column requirement */ + if (cstate->max_fields != 1) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY with format 'raw' must specify exactly one column"))); + }This check should have already been done in BeginCopyFrom(). Is there
any case where max_fields gets to != 1 during reading the input?
Good point. Removed.
---
It's a bit odd to me to use the delimiter as a EOL marker in raw
format, but probably it's okay.--- - if (cstate->opts.format != COPY_FORMAT_BINARY) + if (cstate->opts.format == COPY_FORMAT_RAW && + cstate->opts.delim != NULL) + { + /* Output the user-specified delimiter between rows */ + CopySendString(cstate, cstate->opts.delim); + } + else if (cstate->opts.format == COPY_FORMAT_TEXT || + cstate->opts.format == COPY_FORMAT_CSV)Since it sends the delimiter as a string, even if we specify the
delimiter to '\n', it doesn't send the new line (i.e. ASCII LF, 10).
For example,postgres(1:904427)=# copy (select '{"j" : 1}'::jsonb) to stdout with
(format 'raw', delimiter '\n');
{"j": 1}\npostgres(1:904427)=#I think there is a similar problem in COPY FROM; if we set a delimiter
to '\n' when doing COPY FROM in raw format, it expects the string '\n'
as a line termination but not ASCII LF(10). I think that input data
normally doesn't use the string '\n' as a line termination.
You need to use E'\n' to get ASCII LF(10), since '\n' is just a delimiter
consisting of backslash followed by "n".
Is this a problem? Since any string can be used as delimiter,
I think it would be strange if we parsed it and replaced the string
with a different string.
Another thought:
Maybe we shouldn't default to no delimiter after all,
maybe it would be better to default to the OS default EOL,
and maybe a final delimiter should always be written at the end,
so that when exporting a single json field, it would get exported
to the text file with \n at the end, which is what most text editor
does when saving a .json file.
/Joel
Attachments:
v16-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v16-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download
From 151ffee12d4a44602baf5a29f5e25a21173ce7af Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
fields with it.
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 10 +++---
src/backend/commands/copyfromparse.c | 34 +++++++++----------
src/backend/commands/copyto.c | 20 +++++------
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 70 insertions(+), 58 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..b7e819de40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+ opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV &&
+ (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,8 +822,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY &&
+ opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f350a4ff97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/*
* If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
continue;
/* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryInputInfo(att->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Read and verify binary header */
ReceiveCopyBinaryHeader(cstate);
}
/* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
AttrNumber attr_count = list_length(cstate->attnumlist);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d8..51eb14d743 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return false;
/* Parse the line into de-escaped field values */
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
MemSet(nulls, true, num_phys_attrs * sizeof(bool));
MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
char **field_strings;
ListCell *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
continue;
}
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
if (string == NULL &&
cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
char quotec = '\0';
char escapec = '\0';
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
{
/*
* If character is '\r', we may need to look ahead below. Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \r */
- if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
if (cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
else if (cstate->eol_type == EOL_NL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal carriage return found in data") :
errmsg("unquoted carriage return found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\r\" to represent carriage return.") :
errhint("Use quoted CSV field to represent carriage return.")));
/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
}
/* Process \n */
- if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+ if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
{
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errmsg("literal newline found in data") :
errmsg("unquoted newline found in data"),
- !cstate->opts.csv_mode ?
+ cstate->opts.format != COPY_FORMAT_CSV ?
errhint("Use \"\\n\" to represent newline.") :
errhint("Use quoted CSV field to represent newline.")));
cstate->eol_type = EOL_NL; /* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
* Process backslash, except in CSV mode where backslash is a normal
* character.
*/
- if (c == '\\' && !cstate->opts.csv_mode)
+ if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
{
char c2;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..03c9d71d34 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
break;
case COPY_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
bool isvarlena;
Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate header for a binary copy */
int32 tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* Binary per-tuple header */
CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (!cstate->opts.binary)
+ if (cstate->opts.format != COPY_FORMAT_BINARY)
{
bool need_delim = false;
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..c3d1df267f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2..bb9fe00a6a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromState
CopyFromStateData
--
2.45.1
v16-0002-Add-raw-format-to-COPY-command.patchapplication/octet-stream; name="=?UTF-8?Q?v16-0002-Add-raw-format-to-COPY-command.patch?="Download
From 9c14f498277194dfe99f0f70c1c2fcccbd02c09a Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:16:31 +0300
Subject: [PATCH 2/3] Add raw format to COPY command.
This commit introduces a new raw format to the COPY command, enabling
efficient bulk data transfer of a single text column without any parsing,
quoting, or escaping. In raw format, data is copied exactly as it appears
in the file or table, adhering to the specified ENCODING option or the
current client encoding.
The raw format enforces a single column requirement, ensuring that exactly
one column is specified in the column list. Attempts to specify multiple
columns or omit the column list when the table has multiple columns will
result in an error. Additionally, the DELIMITER option in raw format accepts
any string, including multi-byte characters, providing greater flexibility
in defining data separators. If no DELIMITER is specified, the entire input
or output is treated as a single data value.
Furthermore, the raw format does not support format-specific options such as
NULL, HEADER, QUOTE, ESCAPE, FORCE_QUOTE, FORCE_NOT_NULL, and FORCE_NULL.
Using these options with the raw format will trigger errors, ensuring that
data remains unaltered during the transfer process.
This enhancement is particularly useful when handling text blobs, JSON files,
or other text-based formats where preserving the data "as is" is crucial.
---
doc/src/sgml/ref/copy.sgml | 134 ++++++++++++++++++--
src/backend/commands/copy.c | 89 +++++++++----
src/backend/commands/copyfrom.c | 7 +
src/backend/commands/copyfromparse.c | 155 ++++++++++++++++++++++-
src/backend/commands/copyto.c | 90 ++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 1 +
src/test/regress/expected/copy.out | 52 ++++++++
src/test/regress/expected/copy2.out | 52 +++++++-
src/test/regress/sql/copy.sql | 24 ++++
src/test/regress/sql/copy2.sql | 37 +++++-
12 files changed, 586 insertions(+), 60 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..f17d606537 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<para>
Selects the data format to be read or written:
<literal>text</literal>,
- <literal>csv</literal> (Comma Separated Values),
- or <literal>binary</literal>.
+ <literal>CSV</literal> (Comma Separated Values),
+ <literal>binary</literal>,
+ or <literal>raw</literal>
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
@@ -253,11 +254,27 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>DELIMITER</literal></term>
<listitem>
<para>
- Specifies the character that separates columns within each row
- (line) of the file. The default is a tab character in text format,
- a comma in <literal>CSV</literal> format.
- This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ Specifies the delimiter used in the file. Its usage depends on the
+ <literal>FORMAT</literal> specified:
+ <simplelist>
+ <member>
+ In <literal>text</literal> and <literal>CSV</literal> formats,
+ the delimiter separates <emphasis>columns</emphasis> within each row
+ (line) of the file.
+ The default is a tab character in <literal>text</literal> format and
+ a comma in <literal>CSV</literal> format. This must be a single
+ one-byte character.
+ </member>
+ <member>
+ In <literal>raw</literal> format, the delimiter separates
+ <emphasis>rows</emphasis> in the file. The default is no delimiter,
+ which means that for <command>COPY FROM</command>, the entire input is
+ read as a single field, and for <command>COPY TO</command>, the output
+ is concatenated without any delimiter. If a delimiter is specified,
+ it can be a multi-byte string; for example, <literal>E'\r\n'</literal>
+ can be used when dealing with text files on Windows platforms.
+ </member>
+ </simplelist>
</para>
</listitem>
</varlistentry>
@@ -271,7 +288,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
</para>
<note>
@@ -294,7 +312,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ using <literal>text</literal> or <literal>CSV</literal> format.
</para>
</listitem>
</varlistentry>
@@ -310,7 +328,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is allowed only when using <literal>text</literal> or
+ <literal>CSV</literal> format.
The <literal>MATCH</literal> option is only valid for <command>COPY
FROM</command> commands.
</para>
@@ -400,7 +419,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
- when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ when the <literal>FORMAT</literal> is <literal>text</literal>,
+ <literal>CSV</literal> or <literal>raw</literal>.
</para>
<para>
A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +913,98 @@ COPY <replaceable class="parameter">count</replaceable>
</refsect2>
+ <refsect2 id="sql-copy-raw-format" xreflabel="Raw Format">
+ <title>Raw Format</title>
+
+ <para>
+ The <literal>raw</literal> format is designed for efficient bulk data
+ transfer of a single text column without any parsing, quoting, or
+ escaping. In this format, data is copied exactly as it appears in the file
+ or table, interpreted according to the specified <literal>ENCODING</literal>
+ option or the current client encoding.
+ </para>
+
+ <para>
+ When using the <literal>raw</literal> format, each data value corresponds
+ to a single field with no additional formatting or processing. The
+ <literal>DELIMITER</literal> option specifies the string that separates
+ data values. Unlike in other formats, the delimiter in
+ <literal>raw</literal> format can be any string, including multi-byte
+ characters. If no <literal>DELIMITER</literal> is specified, the entire
+ input or output is treated as a single data value.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format requires that exactly one column be
+ specified in the column list. An error is raised if more than one column
+ is specified or if no column list is specified when the table has multiple
+ columns.
+ </para>
+
+ <para>
+ The <literal>raw</literal> format does not support any of the
+ format-specific options of other formats, such as <literal>NULL</literal>,
+ <literal>HEADER</literal>, <literal>QUOTE</literal>,
+ <literal>ESCAPE</literal>, <literal>FORCE_QUOTE</literal>,
+ <literal>FORCE_NOT_NULL</literal>, and <literal>FORCE_NULL</literal>.
+ Attempting to use these options with <literal>raw</literal> format will
+ result in an error.
+ </para>
+
+ <para>
+ Since the <literal>raw</literal> format deals with text, the data is
+ interpreted according to the specified <literal>ENCODING</literal> option
+ or the current client encoding for input, and encoded using the specified
+ <literal>ENCODING</literal> or the current client encoding for output.
+ </para>
+
+ <note>
+ <para>
+ Empty lines in the input are treated as empty strings, not as
+ <literal>NULL</literal> values. There is no way to represent a
+ <literal>NULL</literal> value in <literal>raw</literal> format.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format is particularly useful when you need to
+ import or export data exactly as it appears. This can be
+ helpful when dealing with large text blobs, JSON files, or other
+ text-based formats.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ The <literal>raw</literal> format can only be used when copying exactly
+ one column. If the table has multiple columns, you must specify the
+ column list containing only one column.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ Unlike other formats, the delimiter in <literal>raw</literal> format can
+ be any string, and there are no restrictions on the characters used in
+ the delimiter, including newline or carriage return characters.
+ </para>
+ </note>
+
+ <note>
+ <para>
+ When using <literal>COPY TO</literal> with <literal>raw</literal> format
+ and a specified <literal>DELIMITER</literal>, there is no check to prevent
+ data values from containing the delimiter string, which could be
+ problematic if it would be needed to import the data preserved using
+ <literal>COPY FROM</literal>, since a data value containing the delimiter
+ would then be split into two values. If this is a concern, a different
+ format should be used instead.
+ </para>
+ </note>
+ </refsect2>
+
+
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b7e819de40..bb3b106ff1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "raw") == 0)
+ opts_out->format = COPY_FORMAT_RAW;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -686,18 +688,61 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ if (opts_out->delim)
+ {
+ if (opts_out->format != COPY_FORMAT_RAW)
+ {
+ /* Only single-byte delimiter strings are supported. */
+ if (strlen(opts_out->delim) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must be a single one-byte character")));
+
+ /* Disallow end-of-line characters */
+ if (strchr(opts_out->delim, '\r') != NULL ||
+ strchr(opts_out->delim, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be newline or carriage return")));
+ }
+ }
/* Set defaults for omitted options */
- if (!opts_out->delim)
- opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->delim = ",";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->delim = "\t";
- if (!opts_out->null_print)
- opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
- opts_out->null_print_len = strlen(opts_out->null_print);
+ if (opts_out->null_print)
+ {
+ if (strchr(opts_out->null_print, '\r') != NULL ||
+ strchr(opts_out->null_print, '\n') != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY null representation cannot use newline or carriage return")));
+
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ opts_out->null_print = "";
+ else if (opts_out->format == COPY_FORMAT_TEXT)
+ opts_out->null_print = "\\N";
+
+ if (opts_out->null_print)
+ opts_out->null_print_len = strlen(opts_out->null_print);
if (opts_out->format == COPY_FORMAT_CSV)
{
@@ -707,25 +752,6 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->escape = opts_out->quote;
}
- /* Only single-byte delimiter strings are supported. */
- if (strlen(opts_out->delim) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must be a single one-byte character")));
-
- /* Disallow end-of-line characters */
- if (strchr(opts_out->delim, '\r') != NULL ||
- strchr(opts_out->delim, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be newline or carriage return")));
-
- if (strchr(opts_out->null_print, '\r') != NULL ||
- strchr(opts_out->null_print, '\n') != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY null representation cannot use newline or carriage return")));
-
if (opts_out->default_print)
{
opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +764,7 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Disallow unsafe delimiter characters in non-CSV mode. We can't allow
+ * Disallow unsafe delimiter characters in text mode. We can't allow
* backslash because it would be ambiguous. We can't allow the other
* cases because data characters matching the delimiter must be
* backslashed, and certain backslash combinations are interpreted
@@ -747,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (opts_out->format != COPY_FORMAT_CSV &&
+ if (opts_out->format == COPY_FORMAT_TEXT &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -761,6 +787,12 @@ ProcessCopyOptions(ParseState *pstate,
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
@@ -839,7 +871,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the null string. */
- if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && opts_out->null_print &&
+ strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: %s is the name of a COPY option, e.g. NULL */
@@ -875,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the default string. */
- if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff97..73a3f38d90 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 51eb14d743..f73bb5a435 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
* formats. The main entry point is NextCopyFrom(), which parses the
* next input line and returns it as Datums.
*
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/raw mode, the parsing happens in multiple stages:
*
* [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
* 1. 2. 3. 4.
@@ -25,7 +25,7 @@
* is copied into 'line_buf', with quotes and escape characters still
* intact.
*
- * 4. CopyReadAttributesText/CSV() function takes the input line from
+ * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
* 'line_buf', and splits it into fields, unescaping the data as required.
* The fields are stored in 'attribute_buf', and 'raw_fields' array holds
* pointers to each field.
@@ -142,6 +142,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate);
static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineRawText(CopyFromState cstate);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -731,7 +732,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
}
/*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or raw mode.
* Return false if no more lines.
*
* An internal temporary buffer is returned via 'fields'. It is valid until
@@ -747,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
int fldct;
bool done;
- /* only available for text or csv input */
+ /* only available for text, csv, or raw input */
Assert(cstate->opts.format != COPY_FORMAT_BINARY);
/* on input check that the header line is correct if needed */
@@ -765,6 +766,9 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
int fldnum;
+ Assert(cstate->opts.format == COPY_FORMAT_CSV ||
+ cstate->opts.format == COPY_FORMAT_TEXT);
+
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
else
@@ -822,8 +826,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
/* Parse the line into de-escaped field values */
if (cstate->opts.format == COPY_FORMAT_CSV)
fldct = CopyReadAttributesCSV(cstate);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
fldct = CopyReadAttributesText(cstate);
+ else
+ {
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+ Assert(cstate->max_fields == 1);
+ /* Point raw_fields directly to line_buf data */
+ cstate->raw_fields[0] = cstate->line_buf.data;
+ fldct = 1;
+ }
*fields = cstate->raw_fields;
*nfields = fldct;
@@ -1095,7 +1107,10 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
/* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate);
+ if (cstate->opts.format == COPY_FORMAT_RAW)
+ result = CopyReadLineRawText(cstate);
+ else
+ result = CopyReadLineText(cstate);
if (result)
{
@@ -1146,6 +1161,22 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf.len -= 2;
cstate->line_buf.data[cstate->line_buf.len] = '\0';
break;
+ case EOL_CUSTOM:
+ {
+ int delim_len;
+
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+ Assert(cstate->opts.delim);
+ delim_len = strlen(cstate->opts.delim);
+ Assert(delim_len > 0);
+ Assert(cstate->line_buf.len >= delim_len);
+ Assert(memcmp(cstate->line_buf.data + cstate->line_buf.len - delim_len,
+ cstate->opts.delim,
+ delim_len) == 0);
+ cstate->line_buf.len -= delim_len;
+ cstate->line_buf.data[cstate->line_buf.len] = '\0';
+ }
+ break;
case EOL_UNKNOWN:
/* shouldn't get here */
Assert(false);
@@ -1461,6 +1492,117 @@ CopyReadLineText(CopyFromState cstate)
return result;
}
+/*
+ * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode
+ */
+static bool
+CopyReadLineRawText(CopyFromState cstate)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool need_data = false;
+ bool hit_eof = false;
+ bool result = false;
+ int delim_len = cstate->opts.delim ? strlen(cstate->opts.delim) : 0;
+
+ /*
+ * The objective of this loop is to transfer data into line_buf until we
+ * find the specified delimiter or reach EOF. In raw format, we treat the
+ * input data as-is, without any parsing, quoting, or escaping. We are
+ * only interested in locating the delimiter to determine the boundaries
+ * of each data value.
+ *
+ * If a delimiter is specified, we read data until we encounter the
+ * delimiter string. If no delimiter is specified, we read the entire
+ * input as a single data value. Unlike text or CSV modes, we do not need
+ * to handle line endings, escape sequences, or special characters.
+ *
+ * The input has already been converted to the database encoding, but
+ * since we're operating in raw mode, we don't need to be concerned with
+ * the encoding details - we simply look for exact string matches with the
+ * delimiter, if there is one specified.
+ *
+ * For speed, we try to move data from input_buf to line_buf in chunks
+ * rather than one character at a time. input_buf_ptr points to the next
+ * character to examine; any characters from input_buf_index to
+ * input_buf_ptr have been determined to be part of the line, but not yet
+ * transferred to line_buf.
+ *
+ * We handle both single-byte and multi-byte delimiters. For multi-byte
+ * delimiters, we ensure that we have enough data in the buffer to compare
+ * the delimiter string.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ for (;;)
+ {
+ int prev_raw_ptr;
+
+ /* Load more data if needed */
+ if (input_buf_ptr >= copy_buf_len || need_data)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* Update local variables */
+ hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /* If no more data, break out of the loop */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ need_data = false;
+ }
+
+ /* Fetch a character */
+ prev_raw_ptr = input_buf_ptr;
+
+ if (delim_len == 0)
+ {
+ /* When reading entire file, consume all remaining bytes at once */
+ input_buf_ptr = copy_buf_len;
+ continue;
+ }
+ else
+ {
+ char *delim_pos;
+
+ /* Check for delimiter, possibly multi-byte */
+ IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(delim_len - 1);
+
+ /* Look for delimiter in the remaining buffer */
+ delim_pos = strstr(©_input_buf[input_buf_ptr], cstate->opts.delim);
+ if (delim_pos != NULL)
+ {
+ /* Found delimiter - move pointer to its position */
+ input_buf_ptr = delim_pos - copy_input_buf;
+ cstate->eol_type = EOL_CUSTOM;
+ input_buf_ptr += delim_len;
+ break;
+ }
+ else
+ {
+ /* No delimiter found - move to end of current buffer */
+ input_buf_ptr = copy_buf_len;
+ continue;
+ }
+ }
+ }
+
+ /* Transfer data to line_buf, including the delimiter if found */
+ REFILL_LINEBUF;
+
+ return result;
+}
+
+
/*
* Return decimal value for a hexadecimal digit
*/
@@ -1937,7 +2079,6 @@ endfield:
return fieldno;
}
-
/*
* Read a binary attribute
*/
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34..2611c0a360 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
static void CopyAttributeOutText(CopyToState cstate, const char *string);
static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
bool use_quote);
+static void CopyAttributeOutRaw(CopyToState cstate, const char *string);
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
@@ -191,7 +192,14 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
/* Default line termination depends on platform */
#ifndef WIN32
@@ -235,9 +243,18 @@ CopySendEndOfRow(CopyToState cstate)
}
break;
case COPY_FRONTEND:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ cstate->opts.delim != NULL)
+ {
+ /* Output the user-specified delimiter between rows */
+ CopySendString(cstate, cstate->opts.delim);
+ }
+ else if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
+ {
+ /* The FE/BE protocol uses \n as newline for all platforms */
CopySendChar(cstate, '\n');
+ }
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
@@ -574,6 +591,13 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Enforce single column requirement for RAW format */
+ if (cstate->opts.format == COPY_FORMAT_RAW &&
+ list_length(cstate->attnumlist) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY with format 'raw' must specify exactly one column")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -839,8 +863,10 @@ DoCopyTo(CopyToState cstate)
if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
- else
+ else if (cstate->opts.format == COPY_FORMAT_TEXT)
CopyAttributeOutText(cstate, colname);
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ CopyAttributeOutRaw(cstate, colname);
}
CopySendEndOfRow(cstate);
@@ -921,7 +947,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- if (cstate->opts.format != COPY_FORMAT_BINARY)
+ if (cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV)
{
bool need_delim = false;
@@ -949,7 +976,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
- else
+ else if (cstate->opts.format == COPY_FORMAT_BINARY)
{
foreach_int(attnum, cstate->attnumlist)
{
@@ -969,6 +996,35 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
}
}
}
+ else if (cstate->opts.format == COPY_FORMAT_RAW)
+ {
+ int attnum;
+ Datum value;
+ bool isnull;
+
+ /* Assert only one column is being copied */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ attnum = linitial_int(cstate->attnumlist);
+ value = slot->tts_values[attnum - 1];
+ isnull = slot->tts_isnull[attnum - 1];
+
+ if (!isnull)
+ {
+ char *string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+
+ CopyAttributeOutRaw(cstate, string);
+ }
+ /* For RAW format, we don't send anything for NULL values */
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("Unsupported COPY format")));
+ }
+
CopySendEndOfRow(cstate);
@@ -1223,6 +1279,28 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
}
}
+/*
+ * Send text representation of one attribute for RAW format.
+ */
+static void
+CopyAttributeOutRaw(CopyToState cstate, const char *string)
+{
+ const char *ptr;
+
+ /* Ensure the format is RAW */
+ Assert(cstate->opts.format == COPY_FORMAT_RAW);
+
+ /* Ensure exactly one column is being processed */
+ Assert(list_length(cstate->attnumlist) == 1);
+
+ if (cstate->need_transcoding)
+ ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+ else
+ ptr = string;
+
+ CopySendString(cstate, ptr);
+}
+
/*
* copy_dest_startup --- executor startup
*/
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af7..7f8d6f4f94 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "raw");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f..8996bc89e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_RAW,
} CopyFormat;
/*
@@ -79,7 +80,7 @@ typedef struct CopyFormatOptions
char *null_print_client; /* same converted to file encoding */
char *default_print; /* DEFAULT marker string */
int default_print_len; /* length of same */
- char *delim; /* column delimiter (must be 1 byte) */
+ char *delim; /* delimiter (1 byte, except for raw format) */
char *quote; /* CSV quote char (must be 1 byte) */
char *escape; /* CSV escape char (must be 1 byte) */
List *force_quote; /* list of column names */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..b8693ae59e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -38,6 +38,7 @@ typedef enum EolType
EOL_NL,
EOL_CR,
EOL_CRNL,
+ EOL_CUSTOM,
} EolType;
/*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..2825d833ea 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,55 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
(2 rows)
DROP TABLE parted_si;
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+ col
+----------------------------------------
+ sharon 25 (15,12) 1000 sam +
+ sam 30 (10,5) 2000 bill +
+ bill 20 (11,10) 1000 sharon+
+
+(1 row)
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+----------------------------------------
+ bill 20 (11,10) 1000 sharon
+ sam 30 (10,5) 2000 bill
+ sharon 25 (15,12) 1000 sam
+(3 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+sharon 25 (15,12) 1000 sam
+***
+sam 30 (10,5) 2000 bill
+***
+bill 20 (11,10) 1000 sharon
+***
+\qecho
+
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+ col
+--------
+
+ "def",
+ abc\.
+ ghi
+(4 rows)
+
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..f31bd6a322 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,15 +90,35 @@ COPY x from stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format RAW, null 'x');
+ERROR: cannot specify NULL in RAW mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format RAW, escape 'x');
+ERROR: COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, quote 'x');
+ERROR: COPY QUOTE requires CSV mode
+COPY x from stdin (format RAW, header);
+ERROR: cannot specify HEADER in RAW mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
^
-COPY x from stdin (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote(a));
+ERROR: COPY FORCE_QUOTE requires CSV mode
+COPY x to stdout (format RAW, force_quote *);
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM
@@ -108,6 +128,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x from stdin (format TEXT, force_not_null *);
ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null(a));
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+COPY x from stdin (format RAW, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
COPY x to stdout (format CSV, force_not_null(a));
ERROR: COPY FORCE_NOT_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +140,10 @@ COPY x from stdin (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x from stdin (format TEXT, force_null *);
ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null(a));
+ERROR: COPY FORCE_NULL requires CSV mode
+COPY x from stdin (format RAW, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdout (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
COPY x to stdout (format CSV, force_null *);
@@ -858,9 +886,11 @@ select id, text_value, ts_value from copy_default;
(2 rows)
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
ERROR: cannot specify DEFAULT in BINARY mode
+copy copy_default from stdin with (format raw, default '\D');
+ERROR: cannot specify DEFAULT in RAW mode
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
ERROR: COPY default representation cannot use newline or carriage return
@@ -929,3 +959,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT cannot be used with COPY TO
+--
+-- Test COPY FORMAT errors
+--
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
+ERROR: COPY with format 'raw' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..93595037dc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,27 @@ COPY parted_si(id, data) FROM :'filename';
SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
DROP TABLE parted_si;
+
+-- Test COPY FORMAT raw
+\set filename :abs_srcdir '/data/emp.data'
+CREATE TABLE copy_raw_test (col text);
+COPY copy_raw_test FROM :'filename' (FORMAT raw);
+SELECT col FROM copy_raw_test;
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM :'filename' (FORMAT raw, DELIMITER E'\n');
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
+\qecho
+TRUNCATE copy_raw_test;
+COPY copy_raw_test FROM stdin (FORMAT raw, DELIMITER E'\n***\n');
+abc\.
+***
+"def",
+***
+
+***
+ghi
+***
+\.
+SELECT col FROM copy_raw_test ORDER BY col COLLATE "C";
+COPY copy_raw_test TO stdout (FORMAT raw, DELIMITER E'\n***\n');
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..7aee4ca8ea 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,32 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
+COPY x from stdin (format RAW, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x from stdin (format RAW, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x from stdin (format RAW, quote 'x');
+COPY x from stdin (format RAW, header);
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x from stdin (format TEXT, force_quote(a));
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
+COPY x to stdout (format RAW, force_quote(a));
+COPY x to stdout (format RAW, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
+COPY x from stdin (format RAW, force_not_null(a));
+COPY x from stdin (format RAW, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
+COPY x from stdin (format RAW, force_null(a));
+COPY x from stdin (format RAW, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +650,9 @@ select id, text_value, ts_value from copy_default;
truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or raw mode
copy copy_default from stdin with (format binary, default '\D');
+copy copy_default from stdin with (format raw, default '\D');
-- DEFAULT cannot be new line nor carriage return
copy copy_default from stdin with (default E'\n');
@@ -707,3 +722,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+--
+-- Test COPY FORMAT errors
+--
+
+\getenv abs_srcdir PG_ABS_SRCDIR
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+\set filename :abs_builddir '/results/copy_raw_test_errors.data'
+
+-- Test single column requirement
+CREATE TABLE copy_raw_test_errors (col1 text, col2 text);
+COPY copy_raw_test_errors TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) TO :'filename' (FORMAT raw);
+COPY copy_raw_test_errors FROM :'filename' (FORMAT raw);
+COPY copy_raw_test_errors (col1, col2) FROM :'filename' (FORMAT raw);
--
2.45.1
v16-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v16-0003-Reorganize-option-validations.patch?="Download
From 664d9b1c86834123d324b1fc9520525658eeac31 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 09:18:37 +0300
Subject: [PATCH 3/3] Reorganize option validations.
---
src/backend/commands/copy.c | 463 ++++++++++++++++++++----------------
1 file changed, 261 insertions(+), 202 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index bb3b106ff1..0ef8ed501b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,39 +673,29 @@ ProcessCopyOptions(ParseState *pstate,
parser_errposition(pstate, defel->location)));
}
- /*
- * Check for incompatible options (must do these three before inserting
- * defaults)
- */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "NULL")));
-
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in RAW mode", "DEFAULT")));
-
+ /* --- FREEZE option --- */
+ if (opts_out->freeze)
+ {
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FREEZE",
+ "COPY TO")));
+ }
+
+ /* --- DELIMITER option --- */
if (opts_out->delim)
{
- if (opts_out->format != COPY_FORMAT_RAW)
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
{
/* Only single-byte delimiter strings are supported. */
if (strlen(opts_out->delim) != 1)
@@ -720,22 +710,53 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter cannot be newline or carriage return")));
}
+
+ if (opts_out->format == COPY_FORMAT_TEXT)
+ {
+ /*
+ * Disallow unsafe delimiter characters in text mode. We can't
+ * allow backslash because it would be ambiguous. We can't allow
+ * the other cases because data characters matching the delimiter
+ * must be backslashed, and certain backslash combinations are
+ * interpreted non-literally by COPY IN. Disallowing all lower
+ * case ASCII letters is more than strictly necessary, but seems
+ * best for consistency and future-proofing. Likewise we disallow
+ * all digits though only octal digits are actually dangerous.
+ */
+ if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+ opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+ }
}
- /* Set defaults for omitted options */
+ /* Set default delimiter */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->delim = ",";
else if (opts_out->format == COPY_FORMAT_TEXT)
opts_out->delim = "\t";
+ /* --- NULL option --- */
if (opts_out->null_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "NULL")));
+
+ /* Disallow end-of-line characters */
if (strchr(opts_out->null_print, '\r') != NULL ||
strchr(opts_out->null_print, '\n') != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY null representation cannot use newline or carriage return")));
-
}
+ /* Set default null_print */
else if (opts_out->format == COPY_FORMAT_CSV)
opts_out->null_print = "";
else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -744,16 +765,23 @@ ProcessCopyOptions(ParseState *pstate,
if (opts_out->null_print)
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->format == COPY_FORMAT_CSV)
- {
- if (!opts_out->quote)
- opts_out->quote = "\"";
- if (!opts_out->escape)
- opts_out->escape = opts_out->quote;
- }
-
+ /* --- DEFAULT option --- */
if (opts_out->default_print)
{
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in RAW mode", "DEFAULT")));
+
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
opts_out->default_print_len = strlen(opts_out->default_print);
if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -761,144 +789,7 @@ ProcessCopyOptions(ParseState *pstate,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY default representation cannot use newline or carriage return")));
- }
- /*
- * Disallow unsafe delimiter characters in text mode. We can't allow
- * backslash because it would be ambiguous. We can't allow the other
- * cases because data characters matching the delimiter must be
- * backslashed, and certain backslash combinations are interpreted
- * non-literally by COPY IN. Disallowing all lower case ASCII letters is
- * more than strictly necessary, but seems best for consistency and
- * future-proofing. Likewise we disallow all digits though only octal
- * digits are actually dangerous.
- */
- if (opts_out->format == COPY_FORMAT_TEXT &&
- strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
- opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
- /* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
- if (opts_out->format == COPY_FORMAT_RAW && opts_out->header_line)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in RAW mode", "HEADER")));
-
- /* Check quote */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "QUOTE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY quote must be a single one-byte character")));
-
- if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
-
- /* Check escape */
- if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
- if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
-
- /* Check force_quote */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
- opts_out->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
- if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
- "COPY FROM")));
-
- /* Check force_notnull */
- if (opts_out->format != COPY_FORMAT_CSV &&
- (opts_out->force_notnull != NIL || opts_out->force_notnull_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
- if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
- "COPY TO")));
-
- /* Check force_null */
- if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
- if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
- !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
- "COPY TO")));
-
- /* Don't allow the delimiter to appear in the null string. */
- if (opts_out->delim && opts_out->null_print &&
- strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("COPY delimiter character must not appear in the %s specification",
- "NULL")));
-
- /* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "NULL")));
-
- /* Check freeze */
- if (opts_out->freeze && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
- second %s is a COPY with direction, e.g. COPY TO */
- errmsg("COPY %s cannot be used with %s", "FREEZE",
- "COPY TO")));
-
- if (opts_out->default_print)
- {
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -908,22 +799,13 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Don't allow the delimiter to appear in the default string. */
- if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+ if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. NULL */
errmsg("COPY delimiter character must not appear in the %s specification",
"DEFAULT")));
- /* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->format == COPY_FORMAT_CSV &&
- strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. NULL */
- errmsg("CSV quote character must not appear in the %s specification",
- "DEFAULT")));
-
/* Don't allow the NULL and DEFAULT string to be the same */
if (opts_out->null_print_len == opts_out->default_print_len &&
strncmp(opts_out->null_print, opts_out->default_print,
@@ -932,20 +814,197 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
- /* Check on_error */
- if (opts_out->format == COPY_FORMAT_BINARY &&
- opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
- if (opts_out->reject_limit && !opts_out->on_error)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- /*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ else
+ {
+ /* No default for default_print; remains NULL */
+ }
+
+ /* --- HEADER option --- */
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER")));
+
+ if (opts_out->format == COPY_FORMAT_RAW)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in RAW mode", "HEADER")));
+ }
+ else
+ {
+ /* Default is no header; no action needed */
+ }
+
+ /* --- QUOTE option --- */
+ if (opts_out->quote)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+ if (strlen(opts_out->quote) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY quote must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default quote */
+ opts_out->quote = "\"";
+ }
+
+ /* --- ESCAPE option --- */
+ if (opts_out->escape)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+ if (strlen(opts_out->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+ }
+ else if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Set default escape to quote character */
+ opts_out->escape = opts_out->quote;
+ }
+
+ /* --- FORCE_QUOTE option --- */
+ if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+ "COPY FROM")));
+ }
+
+ /* --- FORCE_NOT_NULL option --- */
+ if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+ "COPY TO")));
+ }
+
+ /* --- FORCE_NULL option --- */
+ if (opts_out->force_null != NIL || opts_out->force_null_all)
+ {
+ if (opts_out->format != COPY_FORMAT_CSV)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+ second %s is a COPY with direction, e.g. COPY TO */
+ errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+ "COPY TO")));
+ }
+
+ /* --- ON_ERROR option --- */
+ if (opts_out->on_error != COPY_ON_ERROR_STOP)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+ }
+
+ /* --- REJECT_LIMIT option --- */
+ if (opts_out->reject_limit)
+ {
+ if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ }
+
+ /*
+ * Additional checks for interdependent options
+ */
+
+ /* Checks specific to the CSV and TEXT formats */
+ if (opts_out->format == COPY_FORMAT_TEXT ||
+ opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("COPY delimiter character must not appear in the %s specification",
+ "NULL")));
+ }
+
+ /* Checks specific to the CSV format */
+ if (opts_out->format == COPY_FORMAT_CSV)
+ {
+ /* Assert options have been set (defaults applied if not specified) */
+ Assert(opts_out->delim);
+ Assert(opts_out->quote);
+ Assert(opts_out->null_print);
+
+ /* Don't allow the CSV quote char to appear in the default string. */
+ if (opts_out->default_print_len > 0 &&
+ strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "DEFAULT")));
+
+ if (opts_out->delim[0] == opts_out->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: %s is the name of a COPY option, e.g. NULL */
+ errmsg("CSV quote character must not appear in the %s specification",
+ "NULL")));
+ }
}
/*
--
2.45.1
On Tue, Oct 29, 2024, at 17:48, Joel Jacobson wrote:
--- +/* + * CopyReadLineRawText - inner loop of CopyReadLine for raw text mode + */ +static bool +CopyReadLineRawText(CopyFromState cstate)This function has a lot of duplication with CopyReadLineText(). I
think it's better to modify CopyReadLineText() to support 'raw'
format, rather than adding a separate function.Hmm, there is a bit of duplication, yes, but is also a hot-path,
so I think we want to minimize branches and code size in the
hot loop.Combining them into one function, would mean the total function
size and branching increases for both cases.I haven't made any benchmarks on this though.
I made some benchmarks.
Integrating 'raw' into CopyReadLineText() (v17 patch) seems to cause a noticeable slowdown:
v16 = separate functions for csv/text vs raw
v17 = same function for csv/text/raw
The variance is small among the measurements, so seems significant.
However, like Tomas Vondra discovered [1]https://vondra.me/posts/playing-with-bolt-and-postgres/, binary layout matters,
so the observed differences could be due to this, so would need to BOLT
compile, to increase the confidence.
Here is how I benchmarked:
$ cat /data/pg-dev-data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
max_wal_size = '10GB'
autovacuum = 'off'
$ for n in `seq 1 3` ; do dropdb "$USER" ; createdb && pg_ctl restart && psql -a -f bench.sql | grep -E '^copy log from' -A 2 | ./parse_logs.py $n "v17" >> bench.csv ; done
$ ./plot_bench.py
$ psql -f bench_result.sql
format | version | min | min_change | avg | avg_change | max | max_change | stddev
--------+---------+----------+------------+---------+------------+----------+------------+--------
csv | v16 | 3138.921 | | 3167.41 | | 3238.590 | | 28.07
csv | v17 | 3223.475 | 1.027 | 3264.23 | 1.031 | 3325.419 | 1.027 | 32.13
raw | v16 | 1989.118 | | 2018.94 | | 2092.347 | | 28.66
raw | v17 | 1999.410 | 1.005 | 2037.40 | 1.009 | 2105.216 | 1.006 | 33.38
text | v16 | 2653.829 | | 2688.66 | | 2764.434 | | 33.39
text | v17 | 2728.067 | 1.028 | 2765.92 | 1.029 | 2821.602 | 1.021 | 24.44
(6 rows)
/Joel
[1]: https://vondra.me/posts/playing-with-bolt-and-postgres/
Attachments:
image.pngimage/png; name=image.pngDownload
�PNG
IHDR � � �)
n @iCCPICC Profile H��WXS��[RIh���)���H/���$@(1��]Tp�b�*������E������.����u_��|����?g�s���{� �q�'���� ����!�1�)LR m@4`���K��� ������
�����r�����%��@�!N��� > ^��H �y��9��Ha�/��L%���t%������@� ���I3P�yf!?j��B�,�� h0!����(�8
b[h#�X��J�A'�o����<^� V�EQ���|I.o�����]�re>�a�eICc�s�y��31\�i���#� ����H����f�B���?�s� v��!6�8X����3D�\��
A��
���C�@����$�����eH9l�'U���z �I`��_g �*}L�(+> b*�����H��!v���W��*��D�He���-!��C��Xa�48Ve_��?0_lS�������C���Z�<E�p.�e���0�#�10�00H9w�K(N�S�|��*��TIn��7���ys�]��Tc��� ��x�� :^'^���V��/��`MA6�����;eO0�)�B��bF$)z����� A��� E�B�� ��:�Eo�bDx
q��^�%���@F��<X�0�\X����`�3l�D���G���%1�H%�pC���#��V��{���=�)����p��A�=AT,�)�����r��c.pk����>P*�z�!p�]�6�=�A���[��O���OCeGq���!���#����U���1?�X������?���`��%� ;���Nb���X=`bMX�������D����*���:��x��L�;�8w;Q������3Q2U*��*`��A����N��.�.� ��/�����w�k���� �����#���&