New "single" COPY format

Started by Joel Jacobsonabout 1 year ago34 messages

joel@compiler.org

about 1 year ago

3 attachment(s)

Hi hackers,

Thread [1]/messages/by-id/c12516b1-77dc-4ad3-94a7-88527360aee0@app.fastmail.com renamed, since the format name has now been changed from 'raw' to
'single', as suggested by Andrew Dunstan and Jacob Champion.

[1]: /messages/by-id/c12516b1-77dc-4ad3-94a7-88527360aee0@app.fastmail.com

Recap: This is about adding support to import/export text-based formats such as
JSONL, or any unstructured text file, where wanting to import each line "as is"
into a single column, or wanting to export a single column to a text file.

Example importing the meson-logs/testlog.json file Meson generates
when building PostgreSQL, which is in JSONL format:

# create table meson_log (log_line jsonb);
# \copy meson_log from meson-logs/testlog.json (format single);
COPY 306
# select log_line->'name' name, log_line->'result' result from meson_log limit 3;
name | result
-----------------------------------------+--------
"postgresql:setup / tmp_install" | "OK"
"postgresql:setup / install_test_files" | "OK"
"postgresql:setup / initdb_cache" | "OK"
(3 rows)

Changes since v16:

* EOL handling now works the same as for 'text' and 'csv'.
In v16, we supported multi-byte delimiters to allow specifying
e.g. Windows EOL (\r\n), but this seemed unnecessary, if we just do what we do
for text/csv, that is, to auto-detect the EOL for COPY FROM, and use
the OS default EOL for COPY TO.
The DELIMITER option is therefore invalid for the 'single' format.
This is the biggest change in the code, between v16 and v18.
CopyReadLineRawText() has been renamed to CopyReadLineSingleText(),
and changed accordingly.

* A final EOL is now emitted to the last record in COPY TO.
So now it works just like 'text' and 'csv'.

* HEADER [ boolean | MATCH ] now supported
This is now again supported, as previously suggested by Daniel Verite,
possible thanks to the EOL handling.

* Docs updated.

Below is quoted directly from the copy.sgml, but in plaintext:

---
Single Format

This format option is used for importing and exporting files containing
unstructured text, where each line is treated as a single field. It is
useful for data that does not conform to a structured, tabular format and
lacks delimiters.

In the single format, each line of the input or output is
considered a complete value without any field separation. There are no
field delimiters, and all characters are taken literally. There is no
special handling for quotes, backslashes, or escape sequences. All
characters, including whitespace and special characters, are preserved
exactly as they appear in the file. However, it's important to note that
the text is still interpreted according to the specified ENCODING
option or the current client encoding for input, and encoded using the
specified ENCODING or the current client encoding for output.

When using this format, the COPY command must specify exactly one column.
Specifying multiple columns will result in an error.
If the table has multiple columns and no column list is provided, an error
will occur.

The single format does not distinguish a NULL value from an empty string.
Empty lines are imported as empty strings, not as NULL values.

Encoding works the same as in the text and CSV formats.
---

On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote:

I think it would be better to explain how to parse data in raw mode,
especially which steps in the pipeline we skip, in the comment at the
top of copyfromparse.c.

Good idea. I've explained it in the comment.

/Joel

Attachments:

v18-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v18-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download

From 13b67cee37c737fc556c3dcf533895a698916926 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
 fields with it.

---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      | 10 +++---
 src/backend/commands/copyfromparse.c | 34 +++++++++----------
 src/backend/commands/copyto.c        | 20 +++++------
 src/include/commands/copy.h          | 13 ++++++--
 src/tools/pgindent/typedefs.list     |  1 +
 6 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..b7e819de408 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+												opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV &&
+		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,8 +822,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY &&
+		opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..f350a4ff976 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		/*
 		 * If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryInputInfo(att->atttypid,
 								   &in_func_oid, &typioparams[attnum - 1]);
 		else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
 
 	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..51eb14d7432 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		{
 			int			fldnum;
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
 			else
 				fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		return false;
 
 	/* Parse the line into de-escaped field values */
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
 	else
 		fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
 	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		char	  **field_strings;
 		ListCell   *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 				continue;
 			}
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 			{
 				if (string == NULL &&
 					cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		quotec = cstate->opts.quote[0];
 		escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
-		if (cstate->opts.csv_mode)
+		if (cstate->opts.format == COPY_FORMAT_CSV)
 		{
 			/*
 			 * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \r */
-		if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
 					if (cstate->eol_type == EOL_CRNL)
 						ereport(ERROR,
 								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errmsg("literal carriage return found in data") :
 								 errmsg("unquoted carriage return found in data"),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errhint("Use \"\\r\" to represent carriage return.") :
 								 errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
 			else if (cstate->eol_type == EOL_NL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal carriage return found in data") :
 						 errmsg("unquoted carriage return found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\r\" to represent carriage return.") :
 						 errhint("Use quoted CSV field to represent carriage return.")));
 			/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \n */
-		if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal newline found in data") :
 						 errmsg("unquoted newline found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\n\" to represent newline.") :
 						 errhint("Use quoted CSV field to represent newline.")));
 			cstate->eol_type = EOL_NL;	/* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * Process backslash, except in CSV mode where backslash is a normal
 		 * character.
 		 */
-		if (c == '\\' && !cstate->opts.csv_mode)
+		if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
 		{
 			char		c2;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d96751..03c9d71d34a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 			{
 				/* Default line termination depends on platform */
 #ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
 			break;
 		case COPY_FRONTEND:
 			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 				CopySendChar(cstate, '\n');
 
 			/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
 		bool		isvarlena;
 		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryOutputInfo(attr->atttypid,
 									&out_func_oid,
 									&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate header for a binary copy */
 		int32		tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
 				else
 					CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate trailer for a binary copy */
 		CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Binary per-tuple header */
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		bool		need_delim = false;
 
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->opts.force_quote_flags[attnum - 1]);
 				else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..c3d1df267f0 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1847bbfa95c..d9ebfe6cb71 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromState
 CopyFromStateData
-- 
2.45.1

v18-0002-Add-COPY-format-single.patchapplication/octet-stream; name="=?UTF-8?Q?v18-0002-Add-COPY-format-single.patch?="Download

From 7dbb50c055f7d50eddaca37e191f5809928ee588 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 14:35:40 +0100
Subject: [PATCH 2/3] Add COPY format 'single'

---
 doc/src/sgml/ref/copy.sgml           |  57 ++++++++-
 src/backend/commands/copy.c          |  86 +++++++++-----
 src/backend/commands/copyfrom.c      |   7 ++
 src/backend/commands/copyfromparse.c | 172 +++++++++++++++++++++++++--
 src/backend/commands/copyto.c        |  62 +++++++++-
 src/bin/psql/tab-complete.in.c       |   2 +-
 src/include/commands/copy.h          |   1 +
 src/test/regress/expected/copy.out   |  33 +++++
 src/test/regress/expected/copy2.out  |  33 ++++-
 src/test/regress/sql/copy.sql        |  18 +++
 src/test/regress/sql/copy2.sql       |  19 ++-
 11 files changed, 442 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..4189e682817 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Selects the data format to be read or written:
       <literal>text</literal>,
-      <literal>csv</literal> (Comma Separated Values),
-      or <literal>binary</literal>.
+      <literal>CSV</literal> (Comma Separated Values),
+      <literal>binary</literal>,
+      or <literal>single</literal>
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
 
      <note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      using <literal>text</literal> or <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
-      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+      when the <literal>FORMAT</literal> is <literal>text</literal>,
+      <literal>CSV</literal> or <literal>single</literal>.
      </para>
      <para>
       A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,47 @@ COPY <replaceable class="parameter">count</replaceable>
 
   </refsect2>
 
+  <refsect2>
+   <title>Single Format</title>
+
+   <para>
+    This format option is used for importing and exporting files containing
+    unstructured text, where each line is treated as a single field. It is
+    useful for data that does not conform to a structured, tabular format and
+    lacks delimiters.
+   </para>
+
+   <para>
+    In the <literal>single</literal> format, each line of the input or output is
+    considered a complete value without any field separation. There are no
+    field delimiters, and all characters are taken literally. There is no
+    special handling for quotes, backslashes, or escape sequences. All
+    characters, including whitespace and special characters, are preserved
+    exactly as they appear in the file. However, it's important to note that
+    the text is still interpreted according to the specified <literal>ENCODING</literal>
+    option or the current client encoding for input, and encoded using the
+    specified <literal>ENCODING</literal> or the current client encoding for output.
+   </para>
+
+   <para>
+    When using this format, the <command>COPY</command> command must specify
+    exactly one column. Specifying multiple columns will result in an error.
+    If the table has multiple columns and no column list is provided, an error
+    will occur.
+   </para>
+
+   <para>
+    The <literal>single</literal> format does not distinguish a <literal>NULL</literal>
+    value from an empty string. Empty lines are imported as empty strings, not
+    as <literal>NULL</literal> values.
+   </para>
+
+   <para>
+    Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+   </para>
+
+  </refsect2>
+
   <refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
    <title>Binary Format</title>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b7e819de408..3e5bd4513dc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "single") == 0)
+				opts_out->format = COPY_FORMAT_SINGLE;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -681,23 +683,69 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
+	if (opts_out->format == COPY_FORMAT_SINGLE && opts_out->delim)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				 errmsg("cannot specify %s in SINGLE mode", "DELIMITER")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
+	if (opts_out->format == COPY_FORMAT_SINGLE && opts_out->null_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in SINGLE mode", "NULL")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_SINGLE && opts_out->default_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in SINGLE mode", "DEFAULT")));
+
+	if (opts_out->delim)
+	{
+		/* Only single-byte delimiter strings are supported. */
+		if (strlen(opts_out->delim) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY delimiter must be a single one-byte character")));
+
+		/* Disallow end-of-line characters */
+		if (strchr(opts_out->delim, '\r') != NULL ||
+			strchr(opts_out->delim, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter cannot be newline or carriage return")));
+	}
 	/* Set defaults for omitted options */
-	if (!opts_out->delim)
-		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->delim = ",";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->delim = "\t";
 
-	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
-	opts_out->null_print_len = strlen(opts_out->null_print);
+	if (opts_out->null_print)
+	{
+		if (strchr(opts_out->null_print, '\r') != NULL ||
+			strchr(opts_out->null_print, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY null representation cannot use newline or carriage return")));
+
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->null_print = "";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->null_print = "\\N";
+
+	if (opts_out->null_print)
+		opts_out->null_print_len = strlen(opts_out->null_print);
 
 	if (opts_out->format == COPY_FORMAT_CSV)
 	{
@@ -707,25 +755,6 @@ ProcessCopyOptions(ParseState *pstate,
 			opts_out->escape = opts_out->quote;
 	}
 
-	/* Only single-byte delimiter strings are supported. */
-	if (strlen(opts_out->delim) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY delimiter must be a single one-byte character")));
-
-	/* Disallow end-of-line characters */
-	if (strchr(opts_out->delim, '\r') != NULL ||
-		strchr(opts_out->delim, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be newline or carriage return")));
-
-	if (strchr(opts_out->null_print, '\r') != NULL ||
-		strchr(opts_out->null_print, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	if (opts_out->default_print)
 	{
 		opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +767,7 @@ ProcessCopyOptions(ParseState *pstate,
 	}
 
 	/*
-	 * Disallow unsafe delimiter characters in non-CSV mode.  We can't allow
+	 * Disallow unsafe delimiter characters in text mode.  We can't allow
 	 * backslash because it would be ambiguous.  We can't allow the other
 	 * cases because data characters matching the delimiter must be
 	 * backslashed, and certain backslash combinations are interpreted
@@ -747,7 +776,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (opts_out->format != COPY_FORMAT_CSV &&
+	if (opts_out->format == COPY_FORMAT_TEXT &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -839,7 +868,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Don't allow the delimiter to appear in the null string. */
-	if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+	if (opts_out->delim && opts_out->null_print &&
+		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: %s is the name of a COPY option, e.g. NULL */
@@ -875,7 +905,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff976..f5ff26ac022 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for 'single' format */
+	if (cstate->opts.format == COPY_FORMAT_SINGLE &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format 'single' must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 51eb14d7432..04cd33ce076 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
  * formats.  The main entry point is NextCopyFrom(), which parses the
  * next input line and returns it as Datums.
  *
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/single mode, the parsing happens in multiple stages:
  *
  * [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
  *                1.          2.            3.           4.
@@ -28,7 +28,10 @@
  * 4. CopyReadAttributesText/CSV() function takes the input line from
  *    'line_buf', and splits it into fields, unescaping the data as required.
  *    The fields are stored in 'attribute_buf', and 'raw_fields' array holds
- *    pointers to each field.
+ *    pointers to each field. (text/csv modes only)
+ *
+ * In single mode, the fourth stage is skipped because the entire line is
+ * treated as a single field, making field splitting unnecessary.
  *
  * If encoding conversion is not required, a shortcut is taken in step 2 to
  * avoid copying the data unnecessarily.  The 'input_buf' pointer is set to
@@ -142,6 +145,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate);
 static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineSingleText(CopyFromState cstate);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -731,7 +735,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
 }
 
 /*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or single mode.
  * Return false if no more lines.
  *
  * An internal temporary buffer is returned via 'fields'. It is valid until
@@ -747,7 +751,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	int			fldct;
 	bool		done;
 
-	/* only available for text or csv input */
+	/* only available for text, csv, or single input */
 	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
@@ -767,8 +771,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 
 			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
-			else
+			else if (cstate->opts.format == COPY_FORMAT_TEXT)
 				fldct = CopyReadAttributesText(cstate);
+			else
+			{
+				Assert(cstate->opts.format == COPY_FORMAT_SINGLE);
+				Assert(cstate->max_fields == 1);
+				/* Point raw_fields directly to line_buf data */
+				cstate->raw_fields[0] = cstate->line_buf.data;
+				fldct = 1;
+			}
 
 			if (fldct != list_length(cstate->attnumlist))
 				ereport(ERROR,
@@ -822,8 +834,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* Parse the line into de-escaped field values */
 	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
-	else
+	else if (cstate->opts.format == COPY_FORMAT_TEXT)
 		fldct = CopyReadAttributesText(cstate);
+	else
+	{
+		Assert(cstate->opts.format == COPY_FORMAT_SINGLE);
+		Assert(cstate->max_fields == 1);
+		/* Point raw_fields directly to line_buf data */
+		cstate->raw_fields[0] = cstate->line_buf.data;
+		fldct = 1;
+	}
 
 	*fields = cstate->raw_fields;
 	*nfields = fldct;
@@ -1095,7 +1115,10 @@ CopyReadLine(CopyFromState cstate)
 	cstate->line_buf_valid = false;
 
 	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate);
+	if (cstate->opts.format == COPY_FORMAT_SINGLE)
+		result = CopyReadLineSingleText(cstate);
+	else
+		result = CopyReadLineText(cstate);
 
 	if (result)
 	{
@@ -1461,6 +1484,140 @@ CopyReadLineText(CopyFromState cstate)
 	return result;
 }
 
+/*
+ * CopyReadLineSingleText - inner loop of CopyReadLine for single text mode
+ */
+static bool
+CopyReadLineSingleText(CopyFromState cstate)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		need_data = false;
+	bool		hit_eof = false;
+	bool		result = false;
+
+	/*
+	 * The objective of this loop is to transfer the entire next input line
+	 * into line_buf. We only care for detecting newlines (\r and/or \n). All
+	 * other characters are treated as regular data.
+	 *
+	 * For speed, we try to move data from input_buf to line_buf in chunks
+	 * rather than one character at a time.  input_buf_ptr points to the next
+	 * character to examine; any characters from input_buf_index to
+	 * input_buf_ptr have been determined to be part of the line, but not yet
+	 * transferred to line_buf.
+	 *
+	 * For a little extra speed within the loop, we copy input_buf and
+	 * input_buf_len into local variables.
+	 */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	for (;;)
+	{
+		int			prev_raw_ptr;
+		char		c;
+
+		/*
+		 * Load more data if needed.
+		 */
+		if (input_buf_ptr >= copy_buf_len || need_data)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+			need_data = false;
+		}
+
+		/* OK to fetch a character */
+		prev_raw_ptr = input_buf_ptr;
+		c = copy_input_buf[input_buf_ptr++];
+
+		/* Process \r */
+		if (c == '\r')
+		{
+			/* Check for \r\n on first line, _and_ handle \r\n. */
+			if (cstate->eol_type == EOL_UNKNOWN ||
+				cstate->eol_type == EOL_CRNL)
+			{
+				/*
+				 * If need more data, go back to loop top to load it.
+				 *
+				 * Note that if we are at EOF, c will wind up as '\0' because
+				 * of the guaranteed pad of input_buf.
+				 */
+				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+				/* get next char */
+				c = copy_input_buf[input_buf_ptr];
+
+				if (c == '\n')
+				{
+					input_buf_ptr++;	/* eat newline */
+					cstate->eol_type = EOL_CRNL;	/* in case not set yet */
+				}
+				else
+				{
+					if (cstate->eol_type == EOL_CRNL)
+						ereport(ERROR,
+								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+								 errmsg("end-of-copy marker does not match previous newline style")));
+
+					/*
+					 * if we got here, it is the first line and we didn't find
+					 * \n, so don't consume the peeked character
+					 */
+					cstate->eol_type = EOL_CR;
+				}
+			}
+			else if (cstate->eol_type == EOL_NL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* Process \n */
+		if (c == '\n')
+		{
+			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			cstate->eol_type = EOL_NL;	/* in case not set yet */
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* All other characters are treated as regular data */
+	}							/* end of outer loop */
+
+	/*
+	 * Transfer any still-uncopied data to line_buf.
+	 */
+	REFILL_LINEBUF;
+
+	return result;
+}
+
+
 /*
  *	Return decimal value for a hexadecimal digit
  */
@@ -1937,7 +2094,6 @@ endfield:
 	return fieldno;
 }
 
-
 /*
  * Read a binary attribute
  */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34a..78d76db193b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 								bool use_quote);
+static void CopyAttributeOutSingle(CopyToState cstate, const char *string);
 
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
@@ -574,6 +575,13 @@ BeginCopyTo(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for 'single' format */
+	if (cstate->opts.format == COPY_FORMAT_SINGLE &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format 'single' must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -839,8 +847,10 @@ DoCopyTo(CopyToState cstate)
 
 				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
-				else
+				else if (cstate->opts.format == COPY_FORMAT_TEXT)
 					CopyAttributeOutText(cstate, colname);
+				else if (cstate->opts.format == COPY_FORMAT_SINGLE)
+					CopyAttributeOutSingle(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
@@ -921,7 +931,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (cstate->opts.format != COPY_FORMAT_BINARY)
+	if (cstate->opts.format == COPY_FORMAT_TEXT ||
+		cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		bool		need_delim = false;
 
@@ -949,7 +960,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -969,6 +980,35 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else if (cstate->opts.format == COPY_FORMAT_SINGLE)
+	{
+		int			attnum;
+		Datum		value;
+		bool		isnull;
+
+		/* Assert only one column is being copied */
+		Assert(list_length(cstate->attnumlist) == 1);
+
+		attnum = linitial_int(cstate->attnumlist);
+		value = slot->tts_values[attnum - 1];
+		isnull = slot->tts_isnull[attnum - 1];
+
+		if (!isnull)
+		{
+			char	   *string = OutputFunctionCall(&out_functions[attnum - 1],
+													value);
+
+			CopyAttributeOutSingle(cstate, string);
+		}
+		/* For 'single' format, we don't send anything for NULL values */
+	}
+	else
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("Unsupported COPY format")));
+	}
+
 
 	CopySendEndOfRow(cstate);
 
@@ -1223,6 +1263,22 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	}
 }
 
+/*
+ * Send text representation of one attribute for 'single' format.
+ */
+static void
+CopyAttributeOutSingle(CopyToState cstate, const char *string)
+{
+	const char *ptr;
+
+	if (cstate->need_transcoding)
+		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	else
+		ptr = string;
+
+	CopySendString(cstate, ptr);
+}
+
 /*
  * copy_dest_startup --- executor startup
  */
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991d..75f312a9ac5 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "single");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f0..9dab10a492f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_SINGLE,
 } CopyFormat;
 
 /*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84c..b7e9c2dcf2d 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,36 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
 (2 rows)
 
 DROP TABLE parted_si;
+-- Test 'single' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format single);
+select col from single_copytest order by col collate "C";
+                  col                   
+----------------------------------------
+ bill    20      (11,10) 1000    sharon
+ sam     30      (10,5)  2000    bill
+ sharon  25      (15,12) 1000    sam
+(3 rows)
+
+copy single_copytest to stdout (format single);
+sharon	25	(15,12)	1000	sam
+sam	30	(10,5)	2000	bill
+bill	20	(11,10)	1000	sharon
+truncate single_copytest;
+copy single_copytest (col) from stdin (format single, header match);
+select col from single_copytest order by col collate "C";
+  col   
+--------
+ "def",
+ abc\.
+ ghi
+(3 rows)
+
+copy single_copytest (col) to stdout (format single, header);
+col
+abc\.
+"def",
+ghi
+truncate single_copytest;
+drop table single_copytest;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..4ac1701bc0c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,6 +90,20 @@ COPY x from stdin (format BINARY, delimiter ',');
 ERROR:  cannot specify DELIMITER in BINARY mode
 COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
+COPY x (c) from stdin (format SINGLE, null 'x');
+ERROR:  cannot specify NULL in SINGLE mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x (c) from stdin (format SINGLE, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x (c) from stdin (format SINGLE, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
 COPY x from stdin (on_error unsupported);
@@ -100,6 +114,10 @@ COPY x from stdin (format TEXT, force_quote(a));
 ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format TEXT, force_quote *);
 ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format SINGLE, force_quote(a));
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format SINGLE, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format CSV, force_quote(a));
 ERROR:  COPY FORCE_QUOTE cannot be used with COPY FROM
 COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +126,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_not_null *);
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format SINGLE, force_not_null(a));
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format SINGLE, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x to stdout (format CSV, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +138,10 @@ COPY x from stdin (format TEXT, force_null(a));
 ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_null *);
 ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format SINGLE, force_null(a));
+ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format SINGLE, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x to stdout (format CSV, force_null(a));
 ERROR:  COPY FORCE_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_null *);
@@ -858,9 +884,11 @@ select id, text_value, ts_value from copy_default;
 (2 rows)
 
 truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or single mode
 copy copy_default from stdin with (format binary, default '\D');
 ERROR:  cannot specify DEFAULT in BINARY mode
+copy copy_default (text_value) from stdin with (format single, default '\D');
+ERROR:  cannot specify DEFAULT in SINGLE mode
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
 ERROR:  COPY default representation cannot use newline or carriage return
@@ -929,3 +957,6 @@ truncate copy_default;
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
 ERROR:  COPY DEFAULT cannot be used with COPY TO
+-- Test single column requirement
+copy copy_default from stdin with (format single);
+ERROR:  COPY with format 'single' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b04..bfce4688927 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,21 @@ COPY parted_si(id, data) FROM :'filename';
 SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
 
 DROP TABLE parted_si;
+
+-- Test 'single' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format single);
+select col from single_copytest order by col collate "C";
+copy single_copytest to stdout (format single);
+truncate single_copytest;
+copy single_copytest (col) from stdin (format single, header match);
+col
+abc\.
+"def",
+ghi
+\.
+select col from single_copytest order by col collate "C";
+copy single_copytest (col) to stdout (format single, header);
+truncate single_copytest;
+drop table single_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..b105a3604d3 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,31 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
+COPY x (c) from stdin (format SINGLE, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x (c) from stdin (format SINGLE, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x (c) from stdin (format SINGLE, quote 'x');
 COPY x from stdin (format BINARY, on_error ignore);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
+COPY x (c) from stdin (format SINGLE, force_quote(a));
+COPY x (c) from stdin (format SINGLE, force_quote *);
 COPY x from stdin (format CSV, force_quote(a));
 COPY x from stdin (format CSV, force_quote *);
 COPY x from stdin (format TEXT, force_not_null(a));
 COPY x from stdin (format TEXT, force_not_null *);
+COPY x (c) from stdin (format SINGLE, force_not_null(a));
+COPY x (c) from stdin (format SINGLE, force_not_null *);
 COPY x to stdout (format CSV, force_not_null(a));
 COPY x to stdout (format CSV, force_not_null *);
 COPY x from stdin (format TEXT, force_null(a));
 COPY x from stdin (format TEXT, force_null *);
+COPY x (c) from stdin (format SINGLE, force_null(a));
+COPY x (c) from stdin (format SINGLE, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +649,9 @@ select id, text_value, ts_value from copy_default;
 
 truncate copy_default;
 
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or single mode
 copy copy_default from stdin with (format binary, default '\D');
+copy copy_default (text_value) from stdin with (format single, default '\D');
 
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
@@ -707,3 +721,6 @@ truncate copy_default;
 
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
+
+-- Test single column requirement
+copy copy_default from stdin with (format single);
-- 
2.45.1

v18-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v18-0003-Reorganize-option-validations.patch?="Download

From f3df63c91984ff4de3700701ddf9dacf78447ddd Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 15:53:24 +0100
Subject: [PATCH 3/3] Reorganize option validations

---
 src/backend/commands/copy.c | 460 ++++++++++++++++++++----------------
 1 file changed, 259 insertions(+), 201 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3e5bd4513dc..78ae2d855cc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,44 +673,33 @@ ProcessCopyOptions(ParseState *pstate,
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	/*
-	 * Check for incompatible options (must do these three before inserting
-	 * defaults)
-	 */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_SINGLE && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in SINGLE mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_SINGLE && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in SINGLE mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
-	if (opts_out->format == COPY_FORMAT_SINGLE && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in SINGLE mode", "DEFAULT")));
-
+	/* --- FREEZE option --- */
+	if (opts_out->freeze)
+	{
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FREEZE",
+							"COPY TO")));
+	}
+
+	/* --- DELIMITER option --- */
 	if (opts_out->delim)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+		if (opts_out->format == COPY_FORMAT_SINGLE)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in SINGLE mode", "DELIMITER")));
+
 		/* Only single-byte delimiter strings are supported. */
 		if (strlen(opts_out->delim) != 1)
 			ereport(ERROR,
@@ -723,22 +712,53 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY delimiter cannot be newline or carriage return")));
+
+		if (opts_out->format == COPY_FORMAT_TEXT)
+		{
+			/*
+			 * Disallow unsafe delimiter characters in text mode.  We can't
+			 * allow backslash because it would be ambiguous.  We can't allow
+			 * the other cases because data characters matching the delimiter
+			 * must be backslashed, and certain backslash combinations are
+			 * interpreted non-literally by COPY IN.  Disallowing all lower
+			 * case ASCII letters is more than strictly necessary, but seems
+			 * best for consistency and future-proofing.  Likewise we disallow
+			 * all digits though only octal digits are actually dangerous.
+			 */
+			if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+					   opts_out->delim[0]) != NULL)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+		}
 	}
-	/* Set defaults for omitted options */
+	/* Set default delimiter */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->delim = ",";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
 		opts_out->delim = "\t";
 
+	/* --- NULL option --- */
 	if (opts_out->null_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+		if (opts_out->format == COPY_FORMAT_SINGLE)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in SINGLE mode", "NULL")));
+
+		/* Disallow end-of-line characters */
 		if (strchr(opts_out->null_print, '\r') != NULL ||
 			strchr(opts_out->null_print, '\n') != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	}
+	/* Set default null_print */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->null_print = "";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -747,16 +767,23 @@ ProcessCopyOptions(ParseState *pstate,
 	if (opts_out->null_print)
 		opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->format == COPY_FORMAT_CSV)
-	{
-		if (!opts_out->quote)
-			opts_out->quote = "\"";
-		if (!opts_out->escape)
-			opts_out->escape = opts_out->quote;
-	}
-
+	/* --- DEFAULT option --- */
 	if (opts_out->default_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+		if (opts_out->format == COPY_FORMAT_SINGLE)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in SINGLE mode", "DEFAULT")));
+
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
 		opts_out->default_print_len = strlen(opts_out->default_print);
 
 		if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -764,138 +791,7 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY default representation cannot use newline or carriage return")));
-	}
 
-	/*
-	 * Disallow unsafe delimiter characters in text mode.  We can't allow
-	 * backslash because it would be ambiguous.  We can't allow the other
-	 * cases because data characters matching the delimiter must be
-	 * backslashed, and certain backslash combinations are interpreted
-	 * non-literally by COPY IN.  Disallowing all lower case ASCII letters is
-	 * more than strictly necessary, but seems best for consistency and
-	 * future-proofing.  Likewise we disallow all digits though only octal
-	 * digits are actually dangerous.
-	 */
-	if (opts_out->format == COPY_FORMAT_TEXT &&
-		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
-			   opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
-	/* Check header */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
-	/* Check quote */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "QUOTE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY quote must be a single one-byte character")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter and quote must be different")));
-
-	/* Check escape */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY escape must be a single one-byte character")));
-
-	/* Check force_quote */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
-												opts_out->force_quote_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
-	if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
-						"COPY FROM")));
-
-	/* Check force_notnull */
-	if (opts_out->format != COPY_FORMAT_CSV &&
-		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
-	if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
-						"COPY TO")));
-
-	/* Check force_null */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
-												opts_out->force_null_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
-	if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
-						"COPY TO")));
-
-	/* Don't allow the delimiter to appear in the null string. */
-	if (opts_out->delim && opts_out->null_print &&
-		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("COPY delimiter character must not appear in the %s specification",
-						"NULL")));
-
-	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->format == COPY_FORMAT_CSV &&
-		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("CSV quote character must not appear in the %s specification",
-						"NULL")));
-
-	/* Check freeze */
-	if (opts_out->freeze && !is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FREEZE",
-						"COPY TO")));
-
-	if (opts_out->default_print)
-	{
 		if (!is_from)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -905,22 +801,13 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
 					 errmsg("COPY delimiter character must not appear in the %s specification",
 							"DEFAULT")));
 
-		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->format == COPY_FORMAT_CSV &&
-			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			/*- translator: %s is the name of a COPY option, e.g. NULL */
-					 errmsg("CSV quote character must not appear in the %s specification",
-							"DEFAULT")));
-
 		/* Don't allow the NULL and DEFAULT string to be the same */
 		if (opts_out->null_print_len == opts_out->default_print_len &&
 			strncmp(opts_out->null_print, opts_out->default_print,
@@ -929,20 +816,191 @@ ProcessCopyOptions(ParseState *pstate,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
-	/* Check on_error */
-	if (opts_out->format == COPY_FORMAT_BINARY &&
-		opts_out->on_error != COPY_ON_ERROR_STOP)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
-	if (opts_out->reject_limit && !opts_out->on_error)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first and second %s are the names of COPY option, e.g.
-		 * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
-				 errmsg("COPY %s requires %s to be set to %s",
-						"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	else
+	{
+		/* No default for default_print; remains NULL */
+	}
+
+	/* --- HEADER option --- */
+	if (opts_out->header_line != COPY_HEADER_FALSE)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "HEADER")));
+	}
+	else
+	{
+		/* Default is no header; no action needed */
+	}
+
+	/* --- QUOTE option --- */
+	if (opts_out->quote)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+		if (strlen(opts_out->quote) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY quote must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default quote */
+		opts_out->quote = "\"";
+	}
+
+	/* --- ESCAPE option --- */
+	if (opts_out->escape)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+		if (strlen(opts_out->escape) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default escape to quote character */
+		opts_out->escape = opts_out->quote;
+	}
+
+	/* --- FORCE_QUOTE option --- */
+	if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+		if (is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+							"COPY FROM")));
+	}
+
+	/* --- FORCE_NOT_NULL option --- */
+	if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+							"COPY TO")));
+	}
+
+	/* --- FORCE_NULL option --- */
+	if (opts_out->force_null != NIL || opts_out->force_null_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+							"COPY TO")));
+	}
+
+	/* --- ON_ERROR option --- */
+	if (opts_out->on_error != COPY_ON_ERROR_STOP)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+	}
+
+	/* --- REJECT_LIMIT option --- */
+	if (opts_out->reject_limit)
+	{
+		if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first and second %s are the names of COPY option, e.g.
+				* ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+					 errmsg("COPY %s requires %s to be set to %s",
+							"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	}
+
+	/*
+	 * Additional checks for interdependent options
+	 */
+
+	/* Checks specific to the CSV and TEXT formats */
+	if (opts_out->format == COPY_FORMAT_TEXT ||
+		opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the delimiter to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("COPY delimiter character must not appear in the %s specification",
+							"NULL")));
+	}
+
+	/* Checks specific to the CSV format */
+	if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->quote);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the CSV quote char to appear in the default string. */
+		if (opts_out->default_print_len > 0 &&
+			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"DEFAULT")));
+
+		if (opts_out->delim[0] == opts_out->quote[0])
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter and quote must be different")));
+
+		/* Don't allow the CSV quote char to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"NULL")));
+	}
 }
 
 /*
-- 
2.45.1

Masahiko Sawada

sawada.mshk@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#1)

Re: New "single" COPY format

Hi,

On Thu, Nov 7, 2024 at 8:16 AM Joel Jacobson <joel@compiler.org> wrote:

Hi hackers,

Thread [1] renamed, since the format name has now been changed from 'raw' to
'single', as suggested by Andrew Dunstan and Jacob Champion.

[1] /messages/by-id/c12516b1-77dc-4ad3-94a7-88527360aee0@app.fastmail.com

Recap: This is about adding support to import/export text-based formats such as
JSONL, or any unstructured text file, where wanting to import each line "as is"
into a single column, or wanting to export a single column to a text file.

Example importing the meson-logs/testlog.json file Meson generates
when building PostgreSQL, which is in JSONL format:

# create table meson_log (log_line jsonb);
# \copy meson_log from meson-logs/testlog.json (format single);
COPY 306
# select log_line->'name' name, log_line->'result' result from meson_log limit 3;
name | result
-----------------------------------------+--------
"postgresql:setup / tmp_install" | "OK"
"postgresql:setup / install_test_files" | "OK"
"postgresql:setup / initdb_cache" | "OK"
(3 rows)

Changes since v16:

* EOL handling now works the same as for 'text' and 'csv'.
In v16, we supported multi-byte delimiters to allow specifying
e.g. Windows EOL (\r\n), but this seemed unnecessary, if we just do what we do
for text/csv, that is, to auto-detect the EOL for COPY FROM, and use
the OS default EOL for COPY TO.
The DELIMITER option is therefore invalid for the 'single' format.
This is the biggest change in the code, between v16 and v18.
CopyReadLineRawText() has been renamed to CopyReadLineSingleText(),
and changed accordingly.

In earlier versions, we supported loading the whole file into a single
tuple. Is there any reason that it doesn't support it in v18? I think
if it's useful we can improve it in a separate patch.

* A final EOL is now emitted to the last record in COPY TO.
So now it works just like 'text' and 'csv'.

* HEADER [ boolean | MATCH ] now supported
This is now again supported, as previously suggested by Daniel Verite,
possible thanks to the EOL handling.

It makes sense to support it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Masahiko Sawada (#2)

Re: New "single" COPY format

On Fri, Nov 8, 2024, at 00:13, Masahiko Sawada wrote:

In earlier versions, we supported loading the whole file into a single
tuple. Is there any reason that it doesn't support it in v18? I think
if it's useful we can improve it in a separate patch.

Not sure how useful it is, since we already have pg_read_file().

Also, I think it's out of scope for the 'single' format, since I think it should
be about about processing text line by line, in the same way 'csv' and 'text'
work.

The implementation depended on the delimiter option, where the default was
no delimiter, which then read the entire file, and since we don't have
the delimiter option anymore, that approach won't work.

From a docs perspective, it would get quite ugly and confusing, since we would
need to rephrase sentences like the below, since they would then not always
be true:

"each line is treated as a single field"

"each line of the input or output is considered a
complete value without any field separation"

* A final EOL is now emitted to the last record in COPY TO.
So now it works just like 'text' and 'csv'.

+1

* HEADER [ boolean | MATCH ] now supported
This is now again supported, as previously suggested by Daniel Verite,
possible thanks to the EOL handling.

It makes sense to support it.

/Joel

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#3)

Re: New "single" COPY format

On Thursday, November 7, 2024, Joel Jacobson <joel@compiler.org> wrote:

On Fri, Nov 8, 2024, at 00:13, Masahiko Sawada wrote:

In earlier versions, we supported loading the whole file into a single
tuple. Is there any reason that it doesn't support it in v18? I think
if it's useful we can improve it in a separate patch.

Not sure how useful it is, since we already have pg_read_file().

Being forced to have the file server-readable, non-stdin, destroys quite a
bit of the usefulness of pg_read_file.

If we want clients to be able to pass the effort here to the server, copy
is definitely the most useful way to do so.

I’d be concerned choosing “single” given this future possibility. I do
agree that such an enhancement would be best done in its own patch.

David J.

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: David G. Johnston (#4)

Re: New "single" COPY format

On Fri, Nov 8, 2024, at 07:14, David G. Johnston wrote:

On Thursday, November 7, 2024, Joel Jacobson <joel@compiler.org> wrote:

On Fri, Nov 8, 2024, at 00:13, Masahiko Sawada wrote:

In earlier versions, we supported loading the whole file into a single
tuple. Is there any reason that it doesn't support it in v18? I think
if it's useful we can improve it in a separate patch.

Not sure how useful it is, since we already have pg_read_file().

Being forced to have the file server-readable, non-stdin, destroys
quite a bit of the usefulness of pg_read_file.

If we want clients to be able to pass the effort here to the server,
copy is definitely the most useful way to do so.

Right, good point, I agree.

I’d be concerned choosing “single” given this future possibility. I do
agree that such an enhancement would be best done in its own patch.

OK, sounds good to do it in its own patch.

If the name "single" doesn't work for this reason, I see at least
two alternatives:

1) Keep "single" as format name, and let it only be concerned about line by line
processing, and introduce a different format for entire file processing,
in its own patch.

2) Some other format name ("raw"?) that allows such future enhancement to
be done within the same format, in its own patch.

Other ideas?

/Joel

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Joel Jacobson (#5)

Re: New "single" COPY format

On Fri, Nov 8, 2024, at 08:42, Joel Jacobson wrote:

I’d be concerned choosing “single” given this future possibility. I do
agree that such an enhancement would be best done in its own patch.

OK, sounds good to do it in its own patch.

If the name "single" doesn't work for this reason, I see at least
two alternatives:

1) Keep "single" as format name, and let it only be concerned about line by line
processing, and introduce a different format for entire file processing,
in its own patch.

2) Some other format name ("raw"?) that allows such future enhancement to
be done within the same format, in its own patch.

Other ideas?

Sorry for noise, "raw" is of course not an option given how v18 works,
due to the auto-magic EOL detection, same as in "text" and "csv",
as pointed out by others earlier in the thread.

How about "single_column"?

Then, a future patch could implement a "single_value" format,
to process an entire file or value. Such format could then also
support binary data, by detecting if the column type is "bytea".

/Joel

Aleksander Alekseev

aleksander@timescale.com

about 1 year ago

In reply to: Joel Jacobson (#1)

Re: New "single" COPY format

Hi Joel,

Recap: This is about adding support to import/export text-based formats such as
JSONL, or any unstructured text file, where wanting to import each line "as is"
into a single column, or wanting to export a single column to a text file.

Sorry for being late for the discussion.

I disagree with the idea of adding a new format name for this. Mostly
because it is *not* a new format and pretending that it is will be
just a poor and/or confusing user interface.

IMO it should be 'text' we already have with special options e.g.
DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters
and column delimiters (and no NULLs designations, and what else I
forgot) then your text file just contains one tuple per line.

--
Best regards,
Aleksander Alekseev

Aleksander Alekseev

aleksander@timescale.com

about 1 year ago

In reply to: Aleksander Alekseev (#7)

Re: New "single" COPY format

Hi,

Recap: This is about adding support to import/export text-based formats such as
JSONL, or any unstructured text file, where wanting to import each line "as is"
into a single column, or wanting to export a single column to a text file.

Sorry for being late for the discussion.

I disagree with the idea of adding a new format name for this. Mostly
because it is *not* a new format and pretending that it is will be
just a poor and/or confusing user interface.

IMO it should be 'text' we already have with special options e.g.
DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters
and column delimiters (and no NULLs designations, and what else I
forgot) then your text file just contains one tuple per line.

Personally I wouldn't mind a special syntax such as LINES AS IS or
maybe COPY AS IS for convenience. Perhaps we should discuss it
separately though as a syntax sugar for a long list of options we
already support.

--
Best regards,
Aleksander Alekseev

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Aleksander Alekseev (#8)

Re: New "single" COPY format

On Fri, Nov 8, 2024, at 12:25, Aleksander Alekseev wrote:

Sorry for being late for the discussion.

No worries, better late than never, thanks for chiming in.

I disagree with the idea of adding a new format name for this. Mostly
because it is *not* a new format and pretending that it is will be
just a poor and/or confusing user interface.

IMO it should be 'text' we already have with special options e.g.
DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters
and column delimiters (and no NULLs designations, and what else I
forgot) then your text file just contains one tuple per line.

Personally I wouldn't mind a special syntax such as LINES AS IS or
maybe COPY AS IS for convenience. Perhaps we should discuss it
separately though as a syntax sugar for a long list of options we
already support.

From an implementation perspective, I agree with you that this could
be handled by tweaking the existing code for the 'text' and 'csv' formats,
although with a performance penalty for the existing formats.

But from a user-perspective, the implementation is of course irrelevant,
then what I think is important, is that the format should have an intuitive name,
where the default behaviour should match a typical file in the format,
as closely as possible.

For this reason, the 'text' format is unfortunately a poor name,
since it gives the impression it's a generic format for text files,
which it's not, it's a PostgreSQL-specific format, where "\." on a
single line has special meaning, and other defaults such as \N
are also PostgreSQL-specific, and needs to be overriden, if dealing
with a non-PostgreSQL specific text file.
Users who fail to understand these details, risks being surprised.

In contrast, the 'csv' format, works quite as expected.

So for this reason, I think a new format, is a good idea, not only
because it makes it much clearer how to have a fast parsing path
in the implementation, but also because it will increase the chances
users will get things right, when dealing with non-PostgreSQL specific
text files, such as JSONL and log files.

Sure, from an implementation perspective, we could have separate
specialized functions, to allow for fast parsing paths, even if
just overloading the existing options, but that would be a bit awkward I think.

The "DELIMITER AS NULL ESCAPE AS NULL" idea was proposed in the old thread
"Should CSV parsing be stricter about mid-field quotes?" [1] /messages/by-id/8aeab305-5e94-4fa5-82bf-6da6baee6e05@app.fastmail.com

[1]: /messages/by-id/8aeab305-5e94-4fa5-82bf-6da6baee6e05@app.fastmail.com

However, some of us came to the conclusion that it would be
better to introduce a new format, for reasons explained below,
quoted from the old thread [1] /messages/by-id/8aeab305-5e94-4fa5-82bf-6da6baee6e05@app.fastmail.com:

On Wed, Oct 9, 2024, at 18:14, Andrew Dunstan wrote:

On 2024-10-09 We 11:58 AM, Tom Lane wrote:

"Joel Jacobson" <joel@compiler.org> writes:

I think it would be nicest to introduce a new "raw" FORMAT,
that clearly get things right already at the top-level,
instead of trying to customize any of the existing formats.

FWIW, I like this idea. It makes it much clearer how to have a
fast parsing path.

WFM, so something like FORMAT {SIMPLE, RAW, FAST, SINGLE}? We don't seem
to have an existing keyword that we could sanely reuse here.

To add to that, I think there is value of a new format, from a user-friendiness
perspective, by keeping the existing formats and their options intact,
and instead just add a new format, with a clear name, with well-defined
semantics, explained in the docs under its own section, to avoid cluttering
the documentation further, where users would need to assemble various options,
and understand their intricate details, in order to get things right.

/Joel

#10

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Joel Jacobson (#1)

Re: New "single" COPY format

On Thu, Nov 7, 2024, at 17:15, Joel Jacobson wrote:

Attachments:
* v18-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
* v18-0002-Add-COPY-format-single.patch
* v18-0003-Reorganize-option-validations.patch

I want to bring up a potential problem with v18, which has been discussed
before:

On Tue, Oct 15, 2024, at 19:30, Jacob Champion wrote:

Hi,

Idle thoughts from a design perspective -- feel free to ignore, since
I'm not the target audience for the feature:

- If the column data stored in Postgres contains newlines, it seems
like COPY TO won't work "correctly". Is that acceptable?

Example:

CREATE TABLE log (line TEXT);
INSERT INTO log (line) VALUES (E'foo\nbar'), ('baz');
COPY log TO '/tmp/log.txt' (FORMAT 'single');
COPY 2

% cat log.txt
foo
bar
baz

TRUNCATE log;
COPY log FROM '/tmp/log.txt' (FORMAT 'single');
SELECT * FROM log;
line
------
foo
bar
baz
(3 rows)

It would be nice if we could come up with an approach, that didn't introduce
this footgun, while at the same time being convenient for the common use cases.

Ideas?

/Joel

#11

Daniel Verite

daniel@manitou-mail.org

about 1 year ago

In reply to: Aleksander Alekseev (#7)

Re: New "single" COPY format

Aleksander Alekseev wrote:

IMO it should be 'text' we already have with special options e.g.
DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters
and column delimiters (and no NULLs designations, and what else I
forgot) then your text file just contains one tuple per line.

+1 for the idea that accepting "no delimiter" and "no escape"
as a valid combination for the text format seems better
than adding a new format.
However inviting "NULL" into that syntax when it has nothing to do
with the SQL "NULL" does not look like a good idea.
Maybe DELIMITER '' ESCAPE '', or DELIMITER NONE ESCAPE NONE.

Besides, "single" as a format name does not sound right.
Generally the name for a text format designates a set
of characteristics meaning that certain combinations of
characters have specific behaviors.
Sometimes "plain" is used in the context of text formats
to indicate that no character is special ("plain" is also the
default subtype of "text" in MIME types).

"single" as proposed is to be understood as "single-column",
which is a consequence of the lack of a field delimiter, but
not an intrinsic characteristic of the format.
If COPY accepted fixed-length fields, it could be in a
no-delimiter no-escape mode and still handle multiple
columns, in opposition to what "single" suggests.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#12

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Daniel Verite (#11)

Re: New "single" COPY format

On Fri, Nov 8, 2024, at 20:44, Daniel Verite wrote:

Aleksander Alekseev wrote:

IMO it should be 'text' we already have with special options e.g.
DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters
and column delimiters (and no NULLs designations, and what else I
forgot) then your text file just contains one tuple per line.

+1 for the idea that accepting "no delimiter" and "no escape"
as a valid combination for the text format seems better
than adding a new format.
However inviting "NULL" into that syntax when it has nothing to do
with the SQL "NULL" does not look like a good idea.
Maybe DELIMITER '' ESCAPE '', or DELIMITER NONE ESCAPE NONE.

Okay, let's see if we can solve all problems I see with
overloading the 'text' format:

1. Text files containing \. in the middle of the file
% cat /tmp/test.txt
foo
\.
bar

How do we import such a file?
Is it not supported?
Or another option to turn off the special meaning of \.?
Both seems like bad ideas to me, maybe there is a nice idea I fail to see?

2. NULL option is \N for 'text', so to import a plain text
file safely, where \N lines should not be converted to NULL,
users would need to also specify NULL '', which seems
like a footgun to me.

3. What should happen if specifying DELIMITER NONE, and:
- specifying a column list with more than one column?
- not also specifying ESCAPE NONE?

4. What should happen if specifying ESCAPE NONE, and
- specifying a column list with more than one column?

5. What about the isomorphism violation, I brought up in my
previous email, that is, the non-bijective mapping and irreversibility,
for records with embedded newlines?
This is also a problem with a separate format,
but I wonder what you think about the problem,
if it's acceptable, or needs to be solved, and if so,
if you see any solutions.

Besides, "single" as a format name does not sound right.
Generally the name for a text format designates a set
of characteristics meaning that certain combinations of
characters have specific behaviors.
Sometimes "plain" is used in the context of text formats
to indicate that no character is special ("plain" is also the
default subtype of "text" in MIME types).

"single" as proposed is to be understood as "single-column",
which is a consequence of the lack of a field delimiter, but
not an intrinsic characteristic of the format.
If COPY accepted fixed-length fields, it could be in a
no-delimiter no-escape mode and still handle multiple
columns, in opposition to what "single" suggests.

Good points. I agree "plain" is a better name.

/Joel

#13

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#12)

Re: New "single" COPY format

On Fri, Nov 8, 2024 at 2:20 PM Joel Jacobson <joel@compiler.org> wrote:

1. Text files containing \. in the middle of the file
% cat /tmp/test.txt
foo
\.
bar

Or another option to turn off the special meaning of \.?

This does seem like an orthogonal option worth considering.

Besides, "single" as a format name does not sound right.
Generally the name for a text format designates a set
of characteristics meaning that certain combinations of
characters have specific behaviors.
Sometimes "plain" is used in the context of text formats
to indicate that no character is special ("plain" is also the
default subtype of "text" in MIME types).

"single" as proposed is to be understood as "single-column",
which is a consequence of the lack of a field delimiter, but
not an intrinsic characteristic of the format.
If COPY accepted fixed-length fields, it could be in a
no-delimiter no-escape mode and still handle multiple
columns, in opposition to what "single" suggests.

Good points. I agree "plain" is a better name.

I'm on board with a new named format that selects the desired defaults
instead of requiring the user to know and change them all manually.

This seems to me like a "list" format. Implying each row is a list entry.
Since we have tables the concept of list would likewise reasonably imply a
single column.

Since newlines are special, i.e., record delimiters, "plain" thus would
remain misleading. It could be used for a case where the entire file is
loaded into a new row, single column.

David J.

#14

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: David G. Johnston (#13)

3 attachment(s)

Re: New "single" COPY format

On Fri, Nov 8, 2024, at 22:47, David G. Johnston wrote:

On Fri, Nov 8, 2024 at 2:20 PM Joel Jacobson <joel@compiler.org> wrote:

1. Text files containing \. in the middle of the file
% cat /tmp/test.txt
foo
\.
bar

Or another option to turn off the special meaning of \.?

This does seem like an orthogonal option worth considering.

I agree; if we want to integrate this into 'text', it's an option worth considering.

I'm on board with a new named format that selects the desired defaults
instead of requiring the user to know and change them all manually.

This seems to me like a "list" format. Implying each row is a list
entry. Since we have tables the concept of list would likewise
reasonably imply a single column.

Since newlines are special, i.e., record delimiters, "plain" thus would
remain misleading. It could be used for a case where the entire file
is loaded into a new row, single column.

I think 'list' is the best proposal I've heard so far.
New patch attached, only change since v18 is the renaming.

There is one remaining important issue though:

Fields that contain newline characters, cause an irreversibility problem.

It feels wrong to leave this as a potential pitfall for users.

Here's a draft of an idea I'm considering (not yet implemented):

- Fast path for newline-free types:
For the list of built-in types where we know the ::text representation cannot
contain newlines, we take the fast path in NextCopyFromRawFields(),
pointing cstate->raw_fields[0] directly to cstate->line_buf.data.

- Handling newlines for other types:
For any other types, we would need to scan the string for newline characters.
If a newline is encountered, it would, by default, result in an error when
using the list format, unless:

- Optional quoting mechanism:
If the QUOTE option is specified, we can allow newlines within fields by
quoting the entire line. Any quote characters within the field would be
handled by doubling them, similar to CSV escaping rules.

/Joel

Attachments:

v19-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v19-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download

From 13b67cee37c737fc556c3dcf533895a698916926 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
 fields with it.

---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      | 10 +++---
 src/backend/commands/copyfromparse.c | 34 +++++++++----------
 src/backend/commands/copyto.c        | 20 +++++------
 src/include/commands/copy.h          | 13 ++++++--
 src/tools/pgindent/typedefs.list     |  1 +
 6 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..b7e819de408 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+												opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV &&
+		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,8 +822,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY &&
+		opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..f350a4ff976 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		/*
 		 * If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryInputInfo(att->atttypid,
 								   &in_func_oid, &typioparams[attnum - 1]);
 		else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
 
 	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..51eb14d7432 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		{
 			int			fldnum;
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
 			else
 				fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		return false;
 
 	/* Parse the line into de-escaped field values */
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
 	else
 		fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
 	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		char	  **field_strings;
 		ListCell   *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 				continue;
 			}
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 			{
 				if (string == NULL &&
 					cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		quotec = cstate->opts.quote[0];
 		escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
-		if (cstate->opts.csv_mode)
+		if (cstate->opts.format == COPY_FORMAT_CSV)
 		{
 			/*
 			 * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \r */
-		if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
 					if (cstate->eol_type == EOL_CRNL)
 						ereport(ERROR,
 								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errmsg("literal carriage return found in data") :
 								 errmsg("unquoted carriage return found in data"),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errhint("Use \"\\r\" to represent carriage return.") :
 								 errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
 			else if (cstate->eol_type == EOL_NL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal carriage return found in data") :
 						 errmsg("unquoted carriage return found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\r\" to represent carriage return.") :
 						 errhint("Use quoted CSV field to represent carriage return.")));
 			/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \n */
-		if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal newline found in data") :
 						 errmsg("unquoted newline found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\n\" to represent newline.") :
 						 errhint("Use quoted CSV field to represent newline.")));
 			cstate->eol_type = EOL_NL;	/* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * Process backslash, except in CSV mode where backslash is a normal
 		 * character.
 		 */
-		if (c == '\\' && !cstate->opts.csv_mode)
+		if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
 		{
 			char		c2;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d96751..03c9d71d34a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 			{
 				/* Default line termination depends on platform */
 #ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
 			break;
 		case COPY_FRONTEND:
 			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 				CopySendChar(cstate, '\n');
 
 			/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
 		bool		isvarlena;
 		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryOutputInfo(attr->atttypid,
 									&out_func_oid,
 									&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate header for a binary copy */
 		int32		tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
 				else
 					CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate trailer for a binary copy */
 		CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Binary per-tuple header */
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		bool		need_delim = false;
 
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->opts.force_quote_flags[attnum - 1]);
 				else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..c3d1df267f0 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1847bbfa95c..d9ebfe6cb71 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromState
 CopyFromStateData
-- 
2.45.1

v19-0002-Add-COPY-format-list.patchapplication/octet-stream; name="=?UTF-8?Q?v19-0002-Add-COPY-format-list.patch?="Download

From d3be52fe965bb5cdee20f3839d93e108c7a16003 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 14:35:40 +0100
Subject: [PATCH 2/3] Add COPY format 'list'

---
 doc/src/sgml/ref/copy.sgml           |  57 ++++++++-
 src/backend/commands/copy.c          |  86 +++++++++-----
 src/backend/commands/copyfrom.c      |   7 ++
 src/backend/commands/copyfromparse.c | 172 +++++++++++++++++++++++++--
 src/backend/commands/copyto.c        |  62 +++++++++-
 src/bin/psql/tab-complete.in.c       |   2 +-
 src/include/commands/copy.h          |   1 +
 src/test/regress/expected/copy.out   |  33 +++++
 src/test/regress/expected/copy2.out  |  33 ++++-
 src/test/regress/sql/copy.sql        |  18 +++
 src/test/regress/sql/copy2.sql       |  19 ++-
 11 files changed, 442 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..f71119ba6f5 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Selects the data format to be read or written:
       <literal>text</literal>,
-      <literal>csv</literal> (Comma Separated Values),
-      or <literal>binary</literal>.
+      <literal>CSV</literal> (Comma Separated Values),
+      <literal>binary</literal>,
+      or <literal>list</literal>
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
 
      <note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      using <literal>text</literal> or <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
-      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+      when the <literal>FORMAT</literal> is <literal>text</literal>,
+      <literal>CSV</literal> or <literal>list</literal>.
      </para>
      <para>
       A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,47 @@ COPY <replaceable class="parameter">count</replaceable>
 
   </refsect2>
 
+  <refsect2 id="sql-copy-list-format" xreflabel="List Format">
+   <title>List Format</title>
+
+   <para>
+    This format option is used for importing and exporting files containing
+    unstructured text, where each line is treated as a single field. It is
+    useful for data that does not conform to a structured, tabular format and
+    lacks delimiters.
+   </para>
+
+   <para>
+    In the <literal>list</literal> format, each line of the input or output is
+    considered a complete value without any field separation. There are no
+    field delimiters, and all characters are taken literally. There is no
+    special handling for quotes, backslashes, or escape sequences. All
+    characters, including whitespace and special characters, are preserved
+    exactly as they appear in the file. However, it's important to note that
+    the text is still interpreted according to the specified <literal>ENCODING</literal>
+    option or the current client encoding for input, and encoded using the
+    specified <literal>ENCODING</literal> or the current client encoding for output.
+   </para>
+
+   <para>
+    When using this format, the <command>COPY</command> command must specify
+    exactly one column. Specifying multiple columns will result in an error.
+    If the table has multiple columns and no column list is provided, an error
+    will occur.
+   </para>
+
+   <para>
+    The <literal>list</literal> format does not distinguish a <literal>NULL</literal>
+    value from an empty string. Empty lines are imported as empty strings, not
+    as <literal>NULL</literal> values.
+   </para>
+
+   <para>
+    Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+   </para>
+
+  </refsect2>
+
   <refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
    <title>Binary Format</title>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b7e819de408..3b98a8e7db1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "list") == 0)
+				opts_out->format = COPY_FORMAT_LIST;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -681,23 +683,69 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->delim)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->null_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in LIST mode", "NULL")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->default_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
+
+	if (opts_out->delim)
+	{
+		/* Only single-byte delimiter strings are supported. */
+		if (strlen(opts_out->delim) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY delimiter must be a single one-byte character")));
+
+		/* Disallow end-of-line characters */
+		if (strchr(opts_out->delim, '\r') != NULL ||
+			strchr(opts_out->delim, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter cannot be newline or carriage return")));
+	}
 	/* Set defaults for omitted options */
-	if (!opts_out->delim)
-		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->delim = ",";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->delim = "\t";
 
-	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
-	opts_out->null_print_len = strlen(opts_out->null_print);
+	if (opts_out->null_print)
+	{
+		if (strchr(opts_out->null_print, '\r') != NULL ||
+			strchr(opts_out->null_print, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY null representation cannot use newline or carriage return")));
+
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->null_print = "";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->null_print = "\\N";
+
+	if (opts_out->null_print)
+		opts_out->null_print_len = strlen(opts_out->null_print);
 
 	if (opts_out->format == COPY_FORMAT_CSV)
 	{
@@ -707,25 +755,6 @@ ProcessCopyOptions(ParseState *pstate,
 			opts_out->escape = opts_out->quote;
 	}
 
-	/* Only single-byte delimiter strings are supported. */
-	if (strlen(opts_out->delim) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY delimiter must be a single one-byte character")));
-
-	/* Disallow end-of-line characters */
-	if (strchr(opts_out->delim, '\r') != NULL ||
-		strchr(opts_out->delim, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be newline or carriage return")));
-
-	if (strchr(opts_out->null_print, '\r') != NULL ||
-		strchr(opts_out->null_print, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	if (opts_out->default_print)
 	{
 		opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +767,7 @@ ProcessCopyOptions(ParseState *pstate,
 	}
 
 	/*
-	 * Disallow unsafe delimiter characters in non-CSV mode.  We can't allow
+	 * Disallow unsafe delimiter characters in text mode.  We can't allow
 	 * backslash because it would be ambiguous.  We can't allow the other
 	 * cases because data characters matching the delimiter must be
 	 * backslashed, and certain backslash combinations are interpreted
@@ -747,7 +776,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (opts_out->format != COPY_FORMAT_CSV &&
+	if (opts_out->format == COPY_FORMAT_TEXT &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -839,7 +868,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Don't allow the delimiter to appear in the null string. */
-	if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+	if (opts_out->delim && opts_out->null_print &&
+		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: %s is the name of a COPY option, e.g. NULL */
@@ -875,7 +905,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff976..af2b3f3d11f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for 'list' format */
+	if (cstate->opts.format == COPY_FORMAT_LIST &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format 'list' must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 51eb14d7432..f82fd4c1ed4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
  * formats.  The main entry point is NextCopyFrom(), which parses the
  * next input line and returns it as Datums.
  *
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/list mode, the parsing happens in multiple stages:
  *
  * [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
  *                1.          2.            3.           4.
@@ -28,7 +28,10 @@
  * 4. CopyReadAttributesText/CSV() function takes the input line from
  *    'line_buf', and splits it into fields, unescaping the data as required.
  *    The fields are stored in 'attribute_buf', and 'raw_fields' array holds
- *    pointers to each field.
+ *    pointers to each field. (text/csv modes only)
+ *
+ * In list mode, the fourth stage is skipped because the entire line is
+ * treated as a list field, making field splitting unnecessary.
  *
  * If encoding conversion is not required, a shortcut is taken in step 2 to
  * avoid copying the data unnecessarily.  The 'input_buf' pointer is set to
@@ -142,6 +145,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate);
 static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineList(CopyFromState cstate);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -731,7 +735,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
 }
 
 /*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or list mode.
  * Return false if no more lines.
  *
  * An internal temporary buffer is returned via 'fields'. It is valid until
@@ -747,7 +751,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	int			fldct;
 	bool		done;
 
-	/* only available for text or csv input */
+	/* only available for text, csv, or list input */
 	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
@@ -767,8 +771,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 
 			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
-			else
+			else if (cstate->opts.format == COPY_FORMAT_TEXT)
 				fldct = CopyReadAttributesText(cstate);
+			else
+			{
+				Assert(cstate->opts.format == COPY_FORMAT_LIST);
+				Assert(cstate->max_fields == 1);
+				/* Point raw_fields directly to line_buf data */
+				cstate->raw_fields[0] = cstate->line_buf.data;
+				fldct = 1;
+			}
 
 			if (fldct != list_length(cstate->attnumlist))
 				ereport(ERROR,
@@ -822,8 +834,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* Parse the line into de-escaped field values */
 	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
-	else
+	else if (cstate->opts.format == COPY_FORMAT_TEXT)
 		fldct = CopyReadAttributesText(cstate);
+	else
+	{
+		Assert(cstate->opts.format == COPY_FORMAT_LIST);
+		Assert(cstate->max_fields == 1);
+		/* Point raw_fields directly to line_buf data */
+		cstate->raw_fields[0] = cstate->line_buf.data;
+		fldct = 1;
+	}
 
 	*fields = cstate->raw_fields;
 	*nfields = fldct;
@@ -1095,7 +1115,10 @@ CopyReadLine(CopyFromState cstate)
 	cstate->line_buf_valid = false;
 
 	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate);
+	if (cstate->opts.format == COPY_FORMAT_LIST)
+		result = CopyReadLineList(cstate);
+	else
+		result = CopyReadLineText(cstate);
 
 	if (result)
 	{
@@ -1461,6 +1484,140 @@ CopyReadLineText(CopyFromState cstate)
 	return result;
 }
 
+/*
+ * CopyReadLineList - inner loop of CopyReadLine for list text mode
+ */
+static bool
+CopyReadLineList(CopyFromState cstate)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		need_data = false;
+	bool		hit_eof = false;
+	bool		result = false;
+
+	/*
+	 * The objective of this loop is to transfer the entire next input line
+	 * into line_buf. We only care for detecting newlines (\r and/or \n). All
+	 * other characters are treated as regular data.
+	 *
+	 * For speed, we try to move data from input_buf to line_buf in chunks
+	 * rather than one character at a time.  input_buf_ptr points to the next
+	 * character to examine; any characters from input_buf_index to
+	 * input_buf_ptr have been determined to be part of the line, but not yet
+	 * transferred to line_buf.
+	 *
+	 * For a little extra speed within the loop, we copy input_buf and
+	 * input_buf_len into local variables.
+	 */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	for (;;)
+	{
+		int			prev_raw_ptr;
+		char		c;
+
+		/*
+		 * Load more data if needed.
+		 */
+		if (input_buf_ptr >= copy_buf_len || need_data)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+			need_data = false;
+		}
+
+		/* OK to fetch a character */
+		prev_raw_ptr = input_buf_ptr;
+		c = copy_input_buf[input_buf_ptr++];
+
+		/* Process \r */
+		if (c == '\r')
+		{
+			/* Check for \r\n on first line, _and_ handle \r\n. */
+			if (cstate->eol_type == EOL_UNKNOWN ||
+				cstate->eol_type == EOL_CRNL)
+			{
+				/*
+				 * If need more data, go back to loop top to load it.
+				 *
+				 * Note that if we are at EOF, c will wind up as '\0' because
+				 * of the guaranteed pad of input_buf.
+				 */
+				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+				/* get next char */
+				c = copy_input_buf[input_buf_ptr];
+
+				if (c == '\n')
+				{
+					input_buf_ptr++;	/* eat newline */
+					cstate->eol_type = EOL_CRNL;	/* in case not set yet */
+				}
+				else
+				{
+					if (cstate->eol_type == EOL_CRNL)
+						ereport(ERROR,
+								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+								 errmsg("end-of-copy marker does not match previous newline style")));
+
+					/*
+					 * if we got here, it is the first line and we didn't find
+					 * \n, so don't consume the peeked character
+					 */
+					cstate->eol_type = EOL_CR;
+				}
+			}
+			else if (cstate->eol_type == EOL_NL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* Process \n */
+		if (c == '\n')
+		{
+			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			cstate->eol_type = EOL_NL;	/* in case not set yet */
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* All other characters are treated as regular data */
+	}							/* end of outer loop */
+
+	/*
+	 * Transfer any still-uncopied data to line_buf.
+	 */
+	REFILL_LINEBUF;
+
+	return result;
+}
+
+
 /*
  *	Return decimal value for a hexadecimal digit
  */
@@ -1937,7 +2094,6 @@ endfield:
 	return fieldno;
 }
 
-
 /*
  * Read a binary attribute
  */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34a..6779e7b1394 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 								bool use_quote);
+static void CopyAttributeOutList(CopyToState cstate, const char *string);
 
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
@@ -574,6 +575,13 @@ BeginCopyTo(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for 'list' format */
+	if (cstate->opts.format == COPY_FORMAT_LIST &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format 'list' must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -839,8 +847,10 @@ DoCopyTo(CopyToState cstate)
 
 				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
-				else
+				else if (cstate->opts.format == COPY_FORMAT_TEXT)
 					CopyAttributeOutText(cstate, colname);
+				else if (cstate->opts.format == COPY_FORMAT_LIST)
+					CopyAttributeOutList(cstate, colname);
 			}
 
 			CopySendEndOfRow(cstate);
@@ -921,7 +931,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (cstate->opts.format != COPY_FORMAT_BINARY)
+	if (cstate->opts.format == COPY_FORMAT_TEXT ||
+		cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		bool		need_delim = false;
 
@@ -949,7 +960,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -969,6 +980,35 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else if (cstate->opts.format == COPY_FORMAT_LIST)
+	{
+		int			attnum;
+		Datum		value;
+		bool		isnull;
+
+		/* Assert only one column is being copied */
+		Assert(list_length(cstate->attnumlist) == 1);
+
+		attnum = linitial_int(cstate->attnumlist);
+		value = slot->tts_values[attnum - 1];
+		isnull = slot->tts_isnull[attnum - 1];
+
+		if (!isnull)
+		{
+			char	   *string = OutputFunctionCall(&out_functions[attnum - 1],
+													value);
+
+			CopyAttributeOutList(cstate, string);
+		}
+		/* For 'list' format, we don't send anything for NULL values */
+	}
+	else
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("Unsupported COPY format")));
+	}
+
 
 	CopySendEndOfRow(cstate);
 
@@ -1223,6 +1263,22 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	}
 }
 
+/*
+ * Send text representation of one attribute for 'list' format.
+ */
+static void
+CopyAttributeOutList(CopyToState cstate, const char *string)
+{
+	const char *ptr;
+
+	if (cstate->need_transcoding)
+		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	else
+		ptr = string;
+
+	CopySendString(cstate, ptr);
+}
+
 /*
  * copy_dest_startup --- executor startup
  */
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991d..75f312a9ac5 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "single");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f0..44e9934d630 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_LIST,
 } CopyFormat;
 
 /*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84c..f92775dd573 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,36 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
 (2 rows)
 
 DROP TABLE parted_si;
+-- Test 'list' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format list);
+select col from single_copytest order by col collate "C";
+                  col                   
+----------------------------------------
+ bill    20      (11,10) 1000    sharon
+ sam     30      (10,5)  2000    bill
+ sharon  25      (15,12) 1000    sam
+(3 rows)
+
+copy single_copytest to stdout (format list);
+sharon	25	(15,12)	1000	sam
+sam	30	(10,5)	2000	bill
+bill	20	(11,10)	1000	sharon
+truncate single_copytest;
+copy single_copytest (col) from stdin (format list, header match);
+select col from single_copytest order by col collate "C";
+  col   
+--------
+ "def",
+ abc\.
+ ghi
+(3 rows)
+
+copy single_copytest (col) to stdout (format list, header);
+col
+abc\.
+"def",
+ghi
+truncate single_copytest;
+drop table single_copytest;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..fde63fe4eac 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,6 +90,20 @@ COPY x from stdin (format BINARY, delimiter ',');
 ERROR:  cannot specify DELIMITER in BINARY mode
 COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
+COPY x (c) from stdin (format LIST, null 'x');
+ERROR:  cannot specify NULL in LIST mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x (c) from stdin (format LIST, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
 COPY x from stdin (on_error unsupported);
@@ -100,6 +114,10 @@ COPY x from stdin (format TEXT, force_quote(a));
 ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format TEXT, force_quote *);
 ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, force_quote(a));
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format CSV, force_quote(a));
 ERROR:  COPY FORCE_QUOTE cannot be used with COPY FROM
 COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +126,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_not_null *);
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_not_null(a));
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x to stdout (format CSV, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +138,10 @@ COPY x from stdin (format TEXT, force_null(a));
 ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_null *);
 ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_null(a));
+ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x to stdout (format CSV, force_null(a));
 ERROR:  COPY FORCE_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_null *);
@@ -858,9 +884,11 @@ select id, text_value, ts_value from copy_default;
 (2 rows)
 
 truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or list mode
 copy copy_default from stdin with (format binary, default '\D');
 ERROR:  cannot specify DEFAULT in BINARY mode
+copy copy_default (text_value) from stdin with (format list, default '\D');
+ERROR:  cannot specify DEFAULT in LIST mode
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
 ERROR:  COPY default representation cannot use newline or carriage return
@@ -929,3 +957,6 @@ truncate copy_default;
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
 ERROR:  COPY DEFAULT cannot be used with COPY TO
+-- Test list column requirement
+copy copy_default from stdin with (format list);
+ERROR:  COPY with format 'list' must specify exactly one column
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b04..4e40c974f29 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,21 @@ COPY parted_si(id, data) FROM :'filename';
 SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
 
 DROP TABLE parted_si;
+
+-- Test 'list' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format list);
+select col from single_copytest order by col collate "C";
+copy single_copytest to stdout (format list);
+truncate single_copytest;
+copy single_copytest (col) from stdin (format list, header match);
+col
+abc\.
+"def",
+ghi
+\.
+select col from single_copytest order by col collate "C";
+copy single_copytest (col) to stdout (format list, header);
+truncate single_copytest;
+drop table single_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..c7d2ba78565 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,31 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
+COPY x (c) from stdin (format LIST, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x (c) from stdin (format LIST, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x (c) from stdin (format LIST, quote 'x');
 COPY x from stdin (format BINARY, on_error ignore);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
+COPY x (c) from stdin (format LIST, force_quote(a));
+COPY x (c) from stdin (format LIST, force_quote *);
 COPY x from stdin (format CSV, force_quote(a));
 COPY x from stdin (format CSV, force_quote *);
 COPY x from stdin (format TEXT, force_not_null(a));
 COPY x from stdin (format TEXT, force_not_null *);
+COPY x (c) from stdin (format LIST, force_not_null(a));
+COPY x (c) from stdin (format LIST, force_not_null *);
 COPY x to stdout (format CSV, force_not_null(a));
 COPY x to stdout (format CSV, force_not_null *);
 COPY x from stdin (format TEXT, force_null(a));
 COPY x from stdin (format TEXT, force_null *);
+COPY x (c) from stdin (format LIST, force_null(a));
+COPY x (c) from stdin (format LIST, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +649,9 @@ select id, text_value, ts_value from copy_default;
 
 truncate copy_default;
 
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or list mode
 copy copy_default from stdin with (format binary, default '\D');
+copy copy_default (text_value) from stdin with (format list, default '\D');
 
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
@@ -707,3 +721,6 @@ truncate copy_default;
 
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
+
+-- Test list column requirement
+copy copy_default from stdin with (format list);
-- 
2.45.1

v19-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v19-0003-Reorganize-option-validations.patch?="Download

From c1965a0a53382a7e3932a246b20c52d11f91fc65 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 15:53:24 +0100
Subject: [PATCH 3/3] Reorganize option validations

---
 src/backend/commands/copy.c | 460 ++++++++++++++++++++----------------
 1 file changed, 259 insertions(+), 201 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3b98a8e7db1..2de9bc0be8e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,44 +673,33 @@ ProcessCopyOptions(ParseState *pstate,
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	/*
-	 * Check for incompatible options (must do these three before inserting
-	 * defaults)
-	 */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in LIST mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
-
+	/* --- FREEZE option --- */
+	if (opts_out->freeze)
+	{
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FREEZE",
+							"COPY TO")));
+	}
+
+	/* --- DELIMITER option --- */
 	if (opts_out->delim)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
+
 		/* Only single-byte delimiter strings are supported. */
 		if (strlen(opts_out->delim) != 1)
 			ereport(ERROR,
@@ -723,22 +712,53 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY delimiter cannot be newline or carriage return")));
+
+		if (opts_out->format == COPY_FORMAT_TEXT)
+		{
+			/*
+			 * Disallow unsafe delimiter characters in text mode.  We can't
+			 * allow backslash because it would be ambiguous.  We can't allow
+			 * the other cases because data characters matching the delimiter
+			 * must be backslashed, and certain backslash combinations are
+			 * interpreted non-literally by COPY IN.  Disallowing all lower
+			 * case ASCII letters is more than strictly necessary, but seems
+			 * best for consistency and future-proofing.  Likewise we disallow
+			 * all digits though only octal digits are actually dangerous.
+			 */
+			if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+					   opts_out->delim[0]) != NULL)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+		}
 	}
-	/* Set defaults for omitted options */
+	/* Set default delimiter */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->delim = ",";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
 		opts_out->delim = "\t";
 
+	/* --- NULL option --- */
 	if (opts_out->null_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in LIST mode", "NULL")));
+
+		/* Disallow end-of-line characters */
 		if (strchr(opts_out->null_print, '\r') != NULL ||
 			strchr(opts_out->null_print, '\n') != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	}
+	/* Set default null_print */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->null_print = "";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -747,16 +767,23 @@ ProcessCopyOptions(ParseState *pstate,
 	if (opts_out->null_print)
 		opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->format == COPY_FORMAT_CSV)
-	{
-		if (!opts_out->quote)
-			opts_out->quote = "\"";
-		if (!opts_out->escape)
-			opts_out->escape = opts_out->quote;
-	}
-
+	/* --- DEFAULT option --- */
 	if (opts_out->default_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
+
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
 		opts_out->default_print_len = strlen(opts_out->default_print);
 
 		if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -764,138 +791,7 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY default representation cannot use newline or carriage return")));
-	}
 
-	/*
-	 * Disallow unsafe delimiter characters in text mode.  We can't allow
-	 * backslash because it would be ambiguous.  We can't allow the other
-	 * cases because data characters matching the delimiter must be
-	 * backslashed, and certain backslash combinations are interpreted
-	 * non-literally by COPY IN.  Disallowing all lower case ASCII letters is
-	 * more than strictly necessary, but seems best for consistency and
-	 * future-proofing.  Likewise we disallow all digits though only octal
-	 * digits are actually dangerous.
-	 */
-	if (opts_out->format == COPY_FORMAT_TEXT &&
-		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
-			   opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
-	/* Check header */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
-	/* Check quote */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "QUOTE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY quote must be a single one-byte character")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter and quote must be different")));
-
-	/* Check escape */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY escape must be a single one-byte character")));
-
-	/* Check force_quote */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
-												opts_out->force_quote_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
-	if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
-						"COPY FROM")));
-
-	/* Check force_notnull */
-	if (opts_out->format != COPY_FORMAT_CSV &&
-		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
-	if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
-						"COPY TO")));
-
-	/* Check force_null */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
-												opts_out->force_null_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
-	if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
-						"COPY TO")));
-
-	/* Don't allow the delimiter to appear in the null string. */
-	if (opts_out->delim && opts_out->null_print &&
-		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("COPY delimiter character must not appear in the %s specification",
-						"NULL")));
-
-	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->format == COPY_FORMAT_CSV &&
-		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("CSV quote character must not appear in the %s specification",
-						"NULL")));
-
-	/* Check freeze */
-	if (opts_out->freeze && !is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FREEZE",
-						"COPY TO")));
-
-	if (opts_out->default_print)
-	{
 		if (!is_from)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -905,22 +801,13 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
 					 errmsg("COPY delimiter character must not appear in the %s specification",
 							"DEFAULT")));
 
-		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->format == COPY_FORMAT_CSV &&
-			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			/*- translator: %s is the name of a COPY option, e.g. NULL */
-					 errmsg("CSV quote character must not appear in the %s specification",
-							"DEFAULT")));
-
 		/* Don't allow the NULL and DEFAULT string to be the same */
 		if (opts_out->null_print_len == opts_out->default_print_len &&
 			strncmp(opts_out->null_print, opts_out->default_print,
@@ -929,20 +816,191 @@ ProcessCopyOptions(ParseState *pstate,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
-	/* Check on_error */
-	if (opts_out->format == COPY_FORMAT_BINARY &&
-		opts_out->on_error != COPY_ON_ERROR_STOP)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
-	if (opts_out->reject_limit && !opts_out->on_error)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first and second %s are the names of COPY option, e.g.
-		 * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
-				 errmsg("COPY %s requires %s to be set to %s",
-						"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	else
+	{
+		/* No default for default_print; remains NULL */
+	}
+
+	/* --- HEADER option --- */
+	if (opts_out->header_line != COPY_HEADER_FALSE)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "HEADER")));
+	}
+	else
+	{
+		/* Default is no header; no action needed */
+	}
+
+	/* --- QUOTE option --- */
+	if (opts_out->quote)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+		if (strlen(opts_out->quote) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY quote must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default quote */
+		opts_out->quote = "\"";
+	}
+
+	/* --- ESCAPE option --- */
+	if (opts_out->escape)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+		if (strlen(opts_out->escape) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default escape to quote character */
+		opts_out->escape = opts_out->quote;
+	}
+
+	/* --- FORCE_QUOTE option --- */
+	if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+		if (is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+							"COPY FROM")));
+	}
+
+	/* --- FORCE_NOT_NULL option --- */
+	if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+							"COPY TO")));
+	}
+
+	/* --- FORCE_NULL option --- */
+	if (opts_out->force_null != NIL || opts_out->force_null_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+							"COPY TO")));
+	}
+
+	/* --- ON_ERROR option --- */
+	if (opts_out->on_error != COPY_ON_ERROR_STOP)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+	}
+
+	/* --- REJECT_LIMIT option --- */
+	if (opts_out->reject_limit)
+	{
+		if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first and second %s are the names of COPY option, e.g.
+				* ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+					 errmsg("COPY %s requires %s to be set to %s",
+							"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	}
+
+	/*
+	 * Additional checks for interdependent options
+	 */
+
+	/* Checks specific to the CSV and TEXT formats */
+	if (opts_out->format == COPY_FORMAT_TEXT ||
+		opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the delimiter to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("COPY delimiter character must not appear in the %s specification",
+							"NULL")));
+	}
+
+	/* Checks specific to the CSV format */
+	if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->quote);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the CSV quote char to appear in the default string. */
+		if (opts_out->default_print_len > 0 &&
+			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"DEFAULT")));
+
+		if (opts_out->delim[0] == opts_out->quote[0])
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter and quote must be different")));
+
+		/* Don't allow the CSV quote char to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"NULL")));
+	}
 }
 
 /*
-- 
2.45.1

#15

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Joel Jacobson (#14)

Re: New "single" COPY format

On Sat, Nov 9, 2024, at 08:07, Joel Jacobson wrote:

Here's a draft of an idea I'm considering (not yet implemented):

I realize the last part about optional quoting is unnecessary,
since if quoting is desired, users could just use the 'csv' format.

Revised draft of the idea (not yet implemented):

- Handling newlines for other types:
For any other types, we would need to scan the string for newline characters.
If a newline is encountered, it results in an error.

This brings up the question on what to offer users wanting to export text
values containing newlines.

To address this need, I think that's out of scope for the 'list' format,
and is better handled by a separate 'value' format:
- Such a format would be specialized for exporting a value "as is" to a file,
or importing en entire file as a single value.
- Such a value could be a physical single-column single-row,
but could also be constructed using e.g. string_agg().
- The 'value' format could also easily support import/export
binary data (bytea), to e.g. allow importing/exporting images, etc.

Dimensionality perspective on formats:

2D formats: 'text', 'csv', 'binary' (tabular formats)
1D format: 'list' (single-column)
0D format: 'value' (single-column, single-row)

/Joel

#16

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#15)

Re: New "single" COPY format

On Saturday, November 9, 2024, Joel Jacobson <joel@compiler.org> wrote:

On Sat, Nov 9, 2024, at 08:07, Joel Jacobson wrote:

Here's a draft of an idea I'm considering (not yet implemented):

I realize the last part about optional quoting is unnecessary,
since if quoting is desired, users could just use the 'csv' format.

Revised draft of the idea (not yet implemented):

- Fast path for newline-free types:
For the list of built-in types where we know the ::text representation
cannot
contain newlines, we take the fast path in NextCopyFromRawFields(),
pointing cstate->raw_fields[0] directly to cstate->line_buf.data.

- Handling newlines for other types:
For any other types, we would need to scan the string for newline
characters.
If a newline is encountered, it results in an error.

Make sense to me.

David J.

#17

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#14)

Re: New "single" COPY format

On Saturday, November 9, 2024, Joel Jacobson <joel@compiler.org> wrote:

On Fri, Nov 8, 2024, at 22:47, David G. Johnston wrote:

On Fri, Nov 8, 2024 at 2:20 PM Joel Jacobson <joel@compiler.org> wrote:

1. Text files containing \. in the middle of the file
% cat /tmp/test.txt
foo
\.
bar

Or another option to turn off the special meaning of \.?

This does seem like an orthogonal option worth considering.

I agree; if we want to integrate this into 'text', it's an option worth
considering.

PostgreSQL cannot store the NUL byte. Would that be an option for the
record separator. Default to new line but accept NUL if one needs to
input/output lists containing newlines. Or whatever character the user
believes is not part of their data - tab probably being a popular option.

David J.

#18

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: David G. Johnston (#16)

3 attachment(s)

Re: New "single" COPY format

On Sat, Nov 9, 2024, at 15:13, David G. Johnston wrote:

On Saturday, November 9, 2024, Joel Jacobson <joel@compiler.org> wrote:

On Sat, Nov 9, 2024, at 08:07, Joel Jacobson wrote:

Here's a draft of an idea I'm considering (not yet implemented):

I realize the last part about optional quoting is unnecessary,
since if quoting is desired, users could just use the 'csv' format.

Revised draft of the idea (not yet implemented):

- Fast path for newline-free types:
For the list of built-in types where we know the ::text representation cannot
contain newlines, we take the fast path in NextCopyFromRawFields(),
pointing cstate->raw_fields[0] directly to cstate->line_buf.data.

Ops, the above should of course have said:
"we take the fast path in CopyAttributeOutList()".

- Handling newlines for other types:
For any other types, we would need to scan the string for newline characters.
If a newline is encountered, it results in an error.

Make sense to me.

Cool. I've drafted a new patch on this approach.
The list of newline-free built-in types is not exhaustive, yet.

/Joel

Attachments:

v20-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v20-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download

From 13b67cee37c737fc556c3dcf533895a698916926 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
 fields with it.

---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      | 10 +++---
 src/backend/commands/copyfromparse.c | 34 +++++++++----------
 src/backend/commands/copyto.c        | 20 +++++------
 src/include/commands/copy.h          | 13 ++++++--
 src/tools/pgindent/typedefs.list     |  1 +
 6 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..b7e819de408 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+												opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV &&
+		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,8 +822,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY &&
+		opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..f350a4ff976 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		/*
 		 * If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryInputInfo(att->atttypid,
 								   &in_func_oid, &typioparams[attnum - 1]);
 		else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
 
 	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..51eb14d7432 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		{
 			int			fldnum;
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
 			else
 				fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		return false;
 
 	/* Parse the line into de-escaped field values */
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
 	else
 		fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
 	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		char	  **field_strings;
 		ListCell   *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 				continue;
 			}
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 			{
 				if (string == NULL &&
 					cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		quotec = cstate->opts.quote[0];
 		escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
-		if (cstate->opts.csv_mode)
+		if (cstate->opts.format == COPY_FORMAT_CSV)
 		{
 			/*
 			 * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \r */
-		if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
 					if (cstate->eol_type == EOL_CRNL)
 						ereport(ERROR,
 								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errmsg("literal carriage return found in data") :
 								 errmsg("unquoted carriage return found in data"),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errhint("Use \"\\r\" to represent carriage return.") :
 								 errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
 			else if (cstate->eol_type == EOL_NL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal carriage return found in data") :
 						 errmsg("unquoted carriage return found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\r\" to represent carriage return.") :
 						 errhint("Use quoted CSV field to represent carriage return.")));
 			/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \n */
-		if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal newline found in data") :
 						 errmsg("unquoted newline found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\n\" to represent newline.") :
 						 errhint("Use quoted CSV field to represent newline.")));
 			cstate->eol_type = EOL_NL;	/* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * Process backslash, except in CSV mode where backslash is a normal
 		 * character.
 		 */
-		if (c == '\\' && !cstate->opts.csv_mode)
+		if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
 		{
 			char		c2;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d96751..03c9d71d34a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 			{
 				/* Default line termination depends on platform */
 #ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
 			break;
 		case COPY_FRONTEND:
 			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 				CopySendChar(cstate, '\n');
 
 			/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
 		bool		isvarlena;
 		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryOutputInfo(attr->atttypid,
 									&out_func_oid,
 									&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate header for a binary copy */
 		int32		tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
 				else
 					CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate trailer for a binary copy */
 		CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Binary per-tuple header */
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		bool		need_delim = false;
 
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->opts.force_quote_flags[attnum - 1]);
 				else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..c3d1df267f0 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1847bbfa95c..d9ebfe6cb71 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromState
 CopyFromStateData
-- 
2.45.1

v20-0002-Add-COPY-format-list.patchapplication/octet-stream; name="=?UTF-8?Q?v20-0002-Add-COPY-format-list.patch?="Download

From fb4d58ab2ce34ae37f4deb0349d940d4285a6c28 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 14:35:40 +0100
Subject: [PATCH 2/3] Add COPY format 'list'

---
 doc/src/sgml/ref/copy.sgml           |  63 +++++++++-
 src/backend/commands/copy.c          |  86 +++++++++-----
 src/backend/commands/copyfrom.c      |   7 ++
 src/backend/commands/copyfromparse.c | 172 +++++++++++++++++++++++++--
 src/backend/commands/copyto.c        | 119 +++++++++++++++++-
 src/bin/psql/tab-complete.in.c       |   2 +-
 src/include/commands/copy.h          |   1 +
 src/test/regress/expected/copy.out   |  37 ++++++
 src/test/regress/expected/copy2.out  |  39 +++++-
 src/test/regress/sql/copy.sql        |  21 ++++
 src/test/regress/sql/copy2.sql       |  24 +++-
 11 files changed, 522 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..9327ec133bb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Selects the data format to be read or written:
       <literal>text</literal>,
-      <literal>csv</literal> (Comma Separated Values),
-      or <literal>binary</literal>.
+      <literal>CSV</literal> (Comma Separated Values),
+      <literal>binary</literal>,
+      or <literal>list</literal>
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
 
      <note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      using <literal>text</literal> or <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
-      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+      when the <literal>FORMAT</literal> is <literal>text</literal>,
+      <literal>CSV</literal> or <literal>list</literal>.
      </para>
      <para>
       A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,53 @@ COPY <replaceable class="parameter">count</replaceable>
 
   </refsect2>
 
+  <refsect2 id="sql-copy-list-format" xreflabel="List Format">
+   <title>List Format</title>
+
+   <para>
+    This format option is used for importing and exporting files containing
+    unstructured text, where each line is treated as a single field. It is
+    useful for data that does not conform to a structured, tabular format and
+    lacks delimiters.
+   </para>
+
+   <para>
+    In the <literal>list</literal> format, each line of the input or output is
+    considered a complete value without any field separation. There are no
+    field delimiters, and all characters are taken literally. There is no
+    special handling for quotes, backslashes, or escape sequences. All
+    characters, including whitespace and special characters, are preserved
+    exactly as they appear in the file. However, it's important to note that
+    the text is still interpreted according to the specified <literal>ENCODING</literal>
+    option or the current client encoding for input, and encoded using the
+    specified <literal>ENCODING</literal> or the current client encoding for output.
+   </para>
+
+   <para>
+    In <command>COPY TO</command>, the data must not contain any newlines or
+    carriage returns, as these characters are used to separate records. If such
+    characters are encountered in the data, an error will be thrown.
+   </para>
+
+   <para>
+    When using this format, the <command>COPY</command> command must specify
+    exactly one column. Specifying multiple columns will result in an error.
+    If the table has multiple columns and no column list is provided, an error
+    will occur.
+   </para>
+
+   <para>
+    The <literal>list</literal> format does not distinguish a <literal>NULL</literal>
+    value from an empty string. Empty lines are imported as empty strings, not
+    as <literal>NULL</literal> values.
+   </para>
+
+   <para>
+    Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+   </para>
+
+  </refsect2>
+
   <refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
    <title>Binary Format</title>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b7e819de408..3b98a8e7db1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "list") == 0)
+				opts_out->format = COPY_FORMAT_LIST;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -681,23 +683,69 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->delim)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->null_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in LIST mode", "NULL")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->default_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
+
+	if (opts_out->delim)
+	{
+		/* Only single-byte delimiter strings are supported. */
+		if (strlen(opts_out->delim) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY delimiter must be a single one-byte character")));
+
+		/* Disallow end-of-line characters */
+		if (strchr(opts_out->delim, '\r') != NULL ||
+			strchr(opts_out->delim, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter cannot be newline or carriage return")));
+	}
 	/* Set defaults for omitted options */
-	if (!opts_out->delim)
-		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->delim = ",";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->delim = "\t";
 
-	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
-	opts_out->null_print_len = strlen(opts_out->null_print);
+	if (opts_out->null_print)
+	{
+		if (strchr(opts_out->null_print, '\r') != NULL ||
+			strchr(opts_out->null_print, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY null representation cannot use newline or carriage return")));
+
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->null_print = "";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->null_print = "\\N";
+
+	if (opts_out->null_print)
+		opts_out->null_print_len = strlen(opts_out->null_print);
 
 	if (opts_out->format == COPY_FORMAT_CSV)
 	{
@@ -707,25 +755,6 @@ ProcessCopyOptions(ParseState *pstate,
 			opts_out->escape = opts_out->quote;
 	}
 
-	/* Only single-byte delimiter strings are supported. */
-	if (strlen(opts_out->delim) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY delimiter must be a single one-byte character")));
-
-	/* Disallow end-of-line characters */
-	if (strchr(opts_out->delim, '\r') != NULL ||
-		strchr(opts_out->delim, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be newline or carriage return")));
-
-	if (strchr(opts_out->null_print, '\r') != NULL ||
-		strchr(opts_out->null_print, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	if (opts_out->default_print)
 	{
 		opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +767,7 @@ ProcessCopyOptions(ParseState *pstate,
 	}
 
 	/*
-	 * Disallow unsafe delimiter characters in non-CSV mode.  We can't allow
+	 * Disallow unsafe delimiter characters in text mode.  We can't allow
 	 * backslash because it would be ambiguous.  We can't allow the other
 	 * cases because data characters matching the delimiter must be
 	 * backslashed, and certain backslash combinations are interpreted
@@ -747,7 +776,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (opts_out->format != COPY_FORMAT_CSV &&
+	if (opts_out->format == COPY_FORMAT_TEXT &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -839,7 +868,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Don't allow the delimiter to appear in the null string. */
-	if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+	if (opts_out->delim && opts_out->null_print &&
+		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: %s is the name of a COPY option, e.g. NULL */
@@ -875,7 +905,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff976..af2b3f3d11f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for 'list' format */
+	if (cstate->opts.format == COPY_FORMAT_LIST &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format 'list' must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 51eb14d7432..f82fd4c1ed4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
  * formats.  The main entry point is NextCopyFrom(), which parses the
  * next input line and returns it as Datums.
  *
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/list mode, the parsing happens in multiple stages:
  *
  * [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
  *                1.          2.            3.           4.
@@ -28,7 +28,10 @@
  * 4. CopyReadAttributesText/CSV() function takes the input line from
  *    'line_buf', and splits it into fields, unescaping the data as required.
  *    The fields are stored in 'attribute_buf', and 'raw_fields' array holds
- *    pointers to each field.
+ *    pointers to each field. (text/csv modes only)
+ *
+ * In list mode, the fourth stage is skipped because the entire line is
+ * treated as a list field, making field splitting unnecessary.
  *
  * If encoding conversion is not required, a shortcut is taken in step 2 to
  * avoid copying the data unnecessarily.  The 'input_buf' pointer is set to
@@ -142,6 +145,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate);
 static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineList(CopyFromState cstate);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -731,7 +735,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
 }
 
 /*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or list mode.
  * Return false if no more lines.
  *
  * An internal temporary buffer is returned via 'fields'. It is valid until
@@ -747,7 +751,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	int			fldct;
 	bool		done;
 
-	/* only available for text or csv input */
+	/* only available for text, csv, or list input */
 	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
@@ -767,8 +771,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 
 			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
-			else
+			else if (cstate->opts.format == COPY_FORMAT_TEXT)
 				fldct = CopyReadAttributesText(cstate);
+			else
+			{
+				Assert(cstate->opts.format == COPY_FORMAT_LIST);
+				Assert(cstate->max_fields == 1);
+				/* Point raw_fields directly to line_buf data */
+				cstate->raw_fields[0] = cstate->line_buf.data;
+				fldct = 1;
+			}
 
 			if (fldct != list_length(cstate->attnumlist))
 				ereport(ERROR,
@@ -822,8 +834,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* Parse the line into de-escaped field values */
 	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
-	else
+	else if (cstate->opts.format == COPY_FORMAT_TEXT)
 		fldct = CopyReadAttributesText(cstate);
+	else
+	{
+		Assert(cstate->opts.format == COPY_FORMAT_LIST);
+		Assert(cstate->max_fields == 1);
+		/* Point raw_fields directly to line_buf data */
+		cstate->raw_fields[0] = cstate->line_buf.data;
+		fldct = 1;
+	}
 
 	*fields = cstate->raw_fields;
 	*nfields = fldct;
@@ -1095,7 +1115,10 @@ CopyReadLine(CopyFromState cstate)
 	cstate->line_buf_valid = false;
 
 	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate);
+	if (cstate->opts.format == COPY_FORMAT_LIST)
+		result = CopyReadLineList(cstate);
+	else
+		result = CopyReadLineText(cstate);
 
 	if (result)
 	{
@@ -1461,6 +1484,140 @@ CopyReadLineText(CopyFromState cstate)
 	return result;
 }
 
+/*
+ * CopyReadLineList - inner loop of CopyReadLine for list text mode
+ */
+static bool
+CopyReadLineList(CopyFromState cstate)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		need_data = false;
+	bool		hit_eof = false;
+	bool		result = false;
+
+	/*
+	 * The objective of this loop is to transfer the entire next input line
+	 * into line_buf. We only care for detecting newlines (\r and/or \n). All
+	 * other characters are treated as regular data.
+	 *
+	 * For speed, we try to move data from input_buf to line_buf in chunks
+	 * rather than one character at a time.  input_buf_ptr points to the next
+	 * character to examine; any characters from input_buf_index to
+	 * input_buf_ptr have been determined to be part of the line, but not yet
+	 * transferred to line_buf.
+	 *
+	 * For a little extra speed within the loop, we copy input_buf and
+	 * input_buf_len into local variables.
+	 */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	for (;;)
+	{
+		int			prev_raw_ptr;
+		char		c;
+
+		/*
+		 * Load more data if needed.
+		 */
+		if (input_buf_ptr >= copy_buf_len || need_data)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+			need_data = false;
+		}
+
+		/* OK to fetch a character */
+		prev_raw_ptr = input_buf_ptr;
+		c = copy_input_buf[input_buf_ptr++];
+
+		/* Process \r */
+		if (c == '\r')
+		{
+			/* Check for \r\n on first line, _and_ handle \r\n. */
+			if (cstate->eol_type == EOL_UNKNOWN ||
+				cstate->eol_type == EOL_CRNL)
+			{
+				/*
+				 * If need more data, go back to loop top to load it.
+				 *
+				 * Note that if we are at EOF, c will wind up as '\0' because
+				 * of the guaranteed pad of input_buf.
+				 */
+				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+				/* get next char */
+				c = copy_input_buf[input_buf_ptr];
+
+				if (c == '\n')
+				{
+					input_buf_ptr++;	/* eat newline */
+					cstate->eol_type = EOL_CRNL;	/* in case not set yet */
+				}
+				else
+				{
+					if (cstate->eol_type == EOL_CRNL)
+						ereport(ERROR,
+								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+								 errmsg("end-of-copy marker does not match previous newline style")));
+
+					/*
+					 * if we got here, it is the first line and we didn't find
+					 * \n, so don't consume the peeked character
+					 */
+					cstate->eol_type = EOL_CR;
+				}
+			}
+			else if (cstate->eol_type == EOL_NL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* Process \n */
+		if (c == '\n')
+		{
+			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			cstate->eol_type = EOL_NL;	/* in case not set yet */
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* All other characters are treated as regular data */
+	}							/* end of outer loop */
+
+	/*
+	 * Transfer any still-uncopied data to line_buf.
+	 */
+	REFILL_LINEBUF;
+
+	return result;
+}
+
+
 /*
  *	Return decimal value for a hexadecimal digit
  */
@@ -1937,7 +2094,6 @@ endfield:
 	return fieldno;
 }
 
-
 /*
  * Read a binary attribute
  */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34a..843fc03a44f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 								bool use_quote);
+static void CopyAttributeOutList(CopyToState cstate, const char *string, Oid typid);
 
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
@@ -574,6 +575,13 @@ BeginCopyTo(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for 'list' format */
+	if (cstate->opts.format == COPY_FORMAT_LIST &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format 'list' must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -830,17 +838,20 @@ DoCopyTo(CopyToState cstate)
 			{
 				int			attnum = lfirst_int(cur);
 				char	   *colname;
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
 				if (hdr_delim)
 					CopySendChar(cstate, cstate->opts.delim[0]);
 				hdr_delim = true;
 
-				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+				colname = NameStr(attr->attname);
 
 				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
-				else
+				else if (cstate->opts.format == COPY_FORMAT_TEXT)
 					CopyAttributeOutText(cstate, colname);
+				else if (cstate->opts.format == COPY_FORMAT_LIST)
+					CopyAttributeOutList(cstate, colname, attr->atttypid);
 			}
 
 			CopySendEndOfRow(cstate);
@@ -921,7 +932,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (cstate->opts.format != COPY_FORMAT_BINARY)
+	if (cstate->opts.format == COPY_FORMAT_TEXT ||
+		cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		bool		need_delim = false;
 
@@ -949,7 +961,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -969,6 +981,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else if (cstate->opts.format == COPY_FORMAT_LIST)
+	{
+		int			attnum;
+		Datum		value;
+		bool		isnull;
+		Oid			typid;
+
+		/* Assert only one column is being copied */
+		Assert(list_length(cstate->attnumlist) == 1);
+
+		attnum = linitial_int(cstate->attnumlist);
+		value = slot->tts_values[attnum - 1];
+		isnull = slot->tts_isnull[attnum - 1];
+		typid = TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->atttypid;
+
+		if (!isnull)
+		{
+			char	   *string = OutputFunctionCall(&out_functions[attnum - 1],
+													value);
+
+			CopyAttributeOutList(cstate, string, typid);
+		}
+		/* For 'list' format, we don't send anything for NULL values */
+	}
+	else
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("Unsupported COPY format")));
+	}
+
 
 	CopySendEndOfRow(cstate);
 
@@ -1223,6 +1266,74 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	}
 }
 
+/*
+ * Send text representation of the attribute for 'list' format.
+ * Scan for and error on newlines unless the type is known to be newline-free.
+ */
+static void
+CopyAttributeOutList(CopyToState cstate, const char *string, Oid typid)
+{
+	const char *ptr;
+	const char *start;
+	char		c;
+
+	if (cstate->need_transcoding)
+		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	else
+		ptr = string;
+
+	/* Fast path for some types that cannot contain newlines */
+	switch (typid)
+	{
+			/* Numeric types */
+		case INT2OID:
+		case INT4OID:
+		case INT8OID:
+		case FLOAT4OID:
+		case FLOAT8OID:
+		case NUMERICOID:
+		case OIDOID:
+			/* Date/time types */
+		case DATEOID:
+		case TIMEOID:
+		case TIMESTAMPOID:
+		case TIMESTAMPTZOID:
+		case INTERVALOID:
+			/* Network types */
+		case INETOID:
+		case CIDROID:
+		case MACADDROID:
+		case MACADDR8OID:
+			/* Other types */
+		case BOOLOID:
+		case UUIDOID:
+		case JSONBOID:
+			CopySendString(cstate, ptr);
+			return;
+	}
+
+	/*
+	 * Scan the string for newlines, and error if any are found.
+	 */
+	start = ptr;
+	while ((c = *ptr) != '\0')
+	{
+		if (c == '\n' || c == '\r')
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("list format doesn't support newlines in field values"),
+					 errhint("Consider using csv or text format for data containing newlines.")));
+
+		if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+			ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
+		else
+			ptr++;
+	}
+
+	/* If we got here, there were no newlines, so send the string */
+	CopySendString(cstate, start);
+}
+
 /*
  * copy_dest_startup --- executor startup
  */
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991d..75f312a9ac5 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "single");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f0..44e9934d630 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_LIST,
 } CopyFormat;
 
 /*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84c..bff331792bc 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,40 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
 (2 rows)
 
 DROP TABLE parted_si;
+-- Test 'list' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format list);
+select col from single_copytest order by col collate "C";
+                  col                   
+----------------------------------------
+ bill    20      (11,10) 1000    sharon
+ sam     30      (10,5)  2000    bill
+ sharon  25      (15,12) 1000    sam
+(3 rows)
+
+copy single_copytest to stdout (format list);
+sharon	25	(15,12)	1000	sam
+sam	30	(10,5)	2000	bill
+bill	20	(11,10)	1000	sharon
+truncate single_copytest;
+copy single_copytest (col) from stdin (format list, header match);
+select col from single_copytest order by col collate "C";
+  col   
+--------
+ "def",
+ abc\.
+ ghi
+(3 rows)
+
+copy single_copytest (col) to stdout (format list, header);
+col
+abc\.
+"def",
+ghi
+truncate single_copytest;
+alter table single_copytest add column json_line jsonb;
+insert into single_copytest (json_line) values ('{"a": "b"}');
+copy single_copytest (json_line) to stdout (format list);
+{"a": "b"}
+drop table single_copytest;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..23a930d1495 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,6 +90,20 @@ COPY x from stdin (format BINARY, delimiter ',');
 ERROR:  cannot specify DELIMITER in BINARY mode
 COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
+COPY x (c) from stdin (format LIST, null 'x');
+ERROR:  cannot specify NULL in LIST mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x (c) from stdin (format LIST, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
 COPY x from stdin (on_error unsupported);
@@ -100,6 +114,10 @@ COPY x from stdin (format TEXT, force_quote(a));
 ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format TEXT, force_quote *);
 ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, force_quote(a));
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format CSV, force_quote(a));
 ERROR:  COPY FORCE_QUOTE cannot be used with COPY FROM
 COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +126,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_not_null *);
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_not_null(a));
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x to stdout (format CSV, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +138,10 @@ COPY x from stdin (format TEXT, force_null(a));
 ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_null *);
 ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_null(a));
+ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x to stdout (format CSV, force_null(a));
 ERROR:  COPY FORCE_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_null *);
@@ -858,9 +884,11 @@ select id, text_value, ts_value from copy_default;
 (2 rows)
 
 truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or list mode
 copy copy_default from stdin with (format binary, default '\D');
 ERROR:  cannot specify DEFAULT in BINARY mode
+copy copy_default (text_value) from stdin with (format list, default '\D');
+ERROR:  cannot specify DEFAULT in LIST mode
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
 ERROR:  COPY default representation cannot use newline or carriage return
@@ -929,3 +957,12 @@ truncate copy_default;
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
 ERROR:  COPY DEFAULT cannot be used with COPY TO
+-- Test list column requirement
+copy copy_default from stdin with (format list);
+ERROR:  COPY with format 'list' must specify exactly one column
+-- Test error on newlines in list format
+create table copy_list_test (line text);
+insert into copy_list_test values (E'a\nb');
+copy copy_list_test to stdout with (format list);
+ERROR:  list format doesn't support newlines in field values
+HINT:  Consider using csv or text format for data containing newlines.
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b04..40394bca680 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,24 @@ COPY parted_si(id, data) FROM :'filename';
 SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
 
 DROP TABLE parted_si;
+
+-- Test 'list' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format list);
+select col from single_copytest order by col collate "C";
+copy single_copytest to stdout (format list);
+truncate single_copytest;
+copy single_copytest (col) from stdin (format list, header match);
+col
+abc\.
+"def",
+ghi
+\.
+select col from single_copytest order by col collate "C";
+copy single_copytest (col) to stdout (format list, header);
+truncate single_copytest;
+alter table single_copytest add column json_line jsonb;
+insert into single_copytest (json_line) values ('{"a": "b"}');
+copy single_copytest (json_line) to stdout (format list);
+drop table single_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..b0aeb370163 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,31 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
+COPY x (c) from stdin (format LIST, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x (c) from stdin (format LIST, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x (c) from stdin (format LIST, quote 'x');
 COPY x from stdin (format BINARY, on_error ignore);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
+COPY x (c) from stdin (format LIST, force_quote(a));
+COPY x (c) from stdin (format LIST, force_quote *);
 COPY x from stdin (format CSV, force_quote(a));
 COPY x from stdin (format CSV, force_quote *);
 COPY x from stdin (format TEXT, force_not_null(a));
 COPY x from stdin (format TEXT, force_not_null *);
+COPY x (c) from stdin (format LIST, force_not_null(a));
+COPY x (c) from stdin (format LIST, force_not_null *);
 COPY x to stdout (format CSV, force_not_null(a));
 COPY x to stdout (format CSV, force_not_null *);
 COPY x from stdin (format TEXT, force_null(a));
 COPY x from stdin (format TEXT, force_null *);
+COPY x (c) from stdin (format LIST, force_null(a));
+COPY x (c) from stdin (format LIST, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +649,9 @@ select id, text_value, ts_value from copy_default;
 
 truncate copy_default;
 
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or list mode
 copy copy_default from stdin with (format binary, default '\D');
+copy copy_default (text_value) from stdin with (format list, default '\D');
 
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
@@ -707,3 +721,11 @@ truncate copy_default;
 
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
+
+-- Test list column requirement
+copy copy_default from stdin with (format list);
+
+-- Test error on newlines in list format
+create table copy_list_test (line text);
+insert into copy_list_test values (E'a\nb');
+copy copy_list_test to stdout with (format list);
-- 
2.45.1

v20-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v20-0003-Reorganize-option-validations.patch?="Download

From 8c671e5eeadf21c6bac4378f31f167c69aab0877 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 15:53:24 +0100
Subject: [PATCH 3/3] Reorganize option validations

---
 src/backend/commands/copy.c | 460 ++++++++++++++++++++----------------
 1 file changed, 259 insertions(+), 201 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3b98a8e7db1..2de9bc0be8e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,44 +673,33 @@ ProcessCopyOptions(ParseState *pstate,
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	/*
-	 * Check for incompatible options (must do these three before inserting
-	 * defaults)
-	 */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in LIST mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
-
+	/* --- FREEZE option --- */
+	if (opts_out->freeze)
+	{
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FREEZE",
+							"COPY TO")));
+	}
+
+	/* --- DELIMITER option --- */
 	if (opts_out->delim)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
+
 		/* Only single-byte delimiter strings are supported. */
 		if (strlen(opts_out->delim) != 1)
 			ereport(ERROR,
@@ -723,22 +712,53 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY delimiter cannot be newline or carriage return")));
+
+		if (opts_out->format == COPY_FORMAT_TEXT)
+		{
+			/*
+			 * Disallow unsafe delimiter characters in text mode.  We can't
+			 * allow backslash because it would be ambiguous.  We can't allow
+			 * the other cases because data characters matching the delimiter
+			 * must be backslashed, and certain backslash combinations are
+			 * interpreted non-literally by COPY IN.  Disallowing all lower
+			 * case ASCII letters is more than strictly necessary, but seems
+			 * best for consistency and future-proofing.  Likewise we disallow
+			 * all digits though only octal digits are actually dangerous.
+			 */
+			if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+					   opts_out->delim[0]) != NULL)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+		}
 	}
-	/* Set defaults for omitted options */
+	/* Set default delimiter */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->delim = ",";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
 		opts_out->delim = "\t";
 
+	/* --- NULL option --- */
 	if (opts_out->null_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in LIST mode", "NULL")));
+
+		/* Disallow end-of-line characters */
 		if (strchr(opts_out->null_print, '\r') != NULL ||
 			strchr(opts_out->null_print, '\n') != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	}
+	/* Set default null_print */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->null_print = "";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -747,16 +767,23 @@ ProcessCopyOptions(ParseState *pstate,
 	if (opts_out->null_print)
 		opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->format == COPY_FORMAT_CSV)
-	{
-		if (!opts_out->quote)
-			opts_out->quote = "\"";
-		if (!opts_out->escape)
-			opts_out->escape = opts_out->quote;
-	}
-
+	/* --- DEFAULT option --- */
 	if (opts_out->default_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
+
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
 		opts_out->default_print_len = strlen(opts_out->default_print);
 
 		if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -764,138 +791,7 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY default representation cannot use newline or carriage return")));
-	}
 
-	/*
-	 * Disallow unsafe delimiter characters in text mode.  We can't allow
-	 * backslash because it would be ambiguous.  We can't allow the other
-	 * cases because data characters matching the delimiter must be
-	 * backslashed, and certain backslash combinations are interpreted
-	 * non-literally by COPY IN.  Disallowing all lower case ASCII letters is
-	 * more than strictly necessary, but seems best for consistency and
-	 * future-proofing.  Likewise we disallow all digits though only octal
-	 * digits are actually dangerous.
-	 */
-	if (opts_out->format == COPY_FORMAT_TEXT &&
-		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
-			   opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
-	/* Check header */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
-	/* Check quote */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "QUOTE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY quote must be a single one-byte character")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter and quote must be different")));
-
-	/* Check escape */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY escape must be a single one-byte character")));
-
-	/* Check force_quote */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
-												opts_out->force_quote_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
-	if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
-						"COPY FROM")));
-
-	/* Check force_notnull */
-	if (opts_out->format != COPY_FORMAT_CSV &&
-		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
-	if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
-						"COPY TO")));
-
-	/* Check force_null */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
-												opts_out->force_null_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
-	if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
-						"COPY TO")));
-
-	/* Don't allow the delimiter to appear in the null string. */
-	if (opts_out->delim && opts_out->null_print &&
-		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("COPY delimiter character must not appear in the %s specification",
-						"NULL")));
-
-	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->format == COPY_FORMAT_CSV &&
-		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("CSV quote character must not appear in the %s specification",
-						"NULL")));
-
-	/* Check freeze */
-	if (opts_out->freeze && !is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FREEZE",
-						"COPY TO")));
-
-	if (opts_out->default_print)
-	{
 		if (!is_from)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -905,22 +801,13 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
 					 errmsg("COPY delimiter character must not appear in the %s specification",
 							"DEFAULT")));
 
-		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->format == COPY_FORMAT_CSV &&
-			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			/*- translator: %s is the name of a COPY option, e.g. NULL */
-					 errmsg("CSV quote character must not appear in the %s specification",
-							"DEFAULT")));
-
 		/* Don't allow the NULL and DEFAULT string to be the same */
 		if (opts_out->null_print_len == opts_out->default_print_len &&
 			strncmp(opts_out->null_print, opts_out->default_print,
@@ -929,20 +816,191 @@ ProcessCopyOptions(ParseState *pstate,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
-	/* Check on_error */
-	if (opts_out->format == COPY_FORMAT_BINARY &&
-		opts_out->on_error != COPY_ON_ERROR_STOP)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
-	if (opts_out->reject_limit && !opts_out->on_error)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first and second %s are the names of COPY option, e.g.
-		 * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
-				 errmsg("COPY %s requires %s to be set to %s",
-						"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	else
+	{
+		/* No default for default_print; remains NULL */
+	}
+
+	/* --- HEADER option --- */
+	if (opts_out->header_line != COPY_HEADER_FALSE)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "HEADER")));
+	}
+	else
+	{
+		/* Default is no header; no action needed */
+	}
+
+	/* --- QUOTE option --- */
+	if (opts_out->quote)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+		if (strlen(opts_out->quote) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY quote must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default quote */
+		opts_out->quote = "\"";
+	}
+
+	/* --- ESCAPE option --- */
+	if (opts_out->escape)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+		if (strlen(opts_out->escape) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default escape to quote character */
+		opts_out->escape = opts_out->quote;
+	}
+
+	/* --- FORCE_QUOTE option --- */
+	if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+		if (is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+							"COPY FROM")));
+	}
+
+	/* --- FORCE_NOT_NULL option --- */
+	if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+							"COPY TO")));
+	}
+
+	/* --- FORCE_NULL option --- */
+	if (opts_out->force_null != NIL || opts_out->force_null_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+							"COPY TO")));
+	}
+
+	/* --- ON_ERROR option --- */
+	if (opts_out->on_error != COPY_ON_ERROR_STOP)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+	}
+
+	/* --- REJECT_LIMIT option --- */
+	if (opts_out->reject_limit)
+	{
+		if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first and second %s are the names of COPY option, e.g.
+				* ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+					 errmsg("COPY %s requires %s to be set to %s",
+							"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	}
+
+	/*
+	 * Additional checks for interdependent options
+	 */
+
+	/* Checks specific to the CSV and TEXT formats */
+	if (opts_out->format == COPY_FORMAT_TEXT ||
+		opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the delimiter to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("COPY delimiter character must not appear in the %s specification",
+							"NULL")));
+	}
+
+	/* Checks specific to the CSV format */
+	if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->quote);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the CSV quote char to appear in the default string. */
+		if (opts_out->default_print_len > 0 &&
+			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"DEFAULT")));
+
+		if (opts_out->delim[0] == opts_out->quote[0])
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter and quote must be different")));
+
+		/* Don't allow the CSV quote char to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"NULL")));
+	}
 }
 
 /*
-- 
2.45.1

#19

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: David G. Johnston (#17)

Re: New "single" COPY format

On Sat, Nov 9, 2024, at 15:28, David G. Johnston wrote:

PostgreSQL cannot store the NUL byte. Would that be an option for the
record separator. Default to new line but accept NUL if one needs to
input/output lists containing newlines. Or whatever character the user
believes is not part of their data - tab probably being a popular
option.

Clever idea, could work, but using NUL bytes in text files feels a bit
unorthodox, and I can imagine surprising results in other systems having to deal
with such files.

I have no idea how useful such file format would be, but some googling suggest
it's a trick that's used out there, so I won't exclude the idea entirely, just
feels like a type of hack where it's difficult to foresee the consequences
of allowing it.

/Joel

#20

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#19)

Re: New "single" COPY format

On Sat, Nov 9, 2024 at 1:48 PM Joel Jacobson <joel@compiler.org> wrote:

On Sat, Nov 9, 2024, at 15:28, David G. Johnston wrote:

PostgreSQL cannot store the NUL byte. Would that be an option for the
record separator. Default to new line but accept NUL if one needs to
input/output lists containing newlines. Or whatever character the user
believes is not part of their data - tab probably being a popular
option.

Clever idea, could work, but using NUL bytes in text files feels a bit
unorthodox, and I can imagine surprising results in other systems having
to deal
with such files.

Yeah. I was inspired by xargs and find but for a permanent file it is a
bit different.

David J.

#21

jian he

jian.universality@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#18)

Re: New "single" COPY format

On Sun, Nov 10, 2024 at 3:29 AM Joel Jacobson <joel@compiler.org> wrote:

Cool. I've drafted a new patch on this approach.
The list of newline-free built-in types is not exhaustive, yet.

do we care that COPY back and forth always work?
doc not mentioned, but seems it's an implicit idea.

copy the_table to '/tmp/3.txt' with (format whatever_format);
truncate the_table;
copy the_table from '/tmp/3.txt' with (format whatever_format);

but v20, will not work for an non-text column with SQL NULL data in it.

example:
drop table if exists x1;
create table x1(a int);
insert into x1 select null;
copy x1 to '/tmp/3.txt' with (format list);
copy x1 from '/tmp/3.txt' with (format list);
ERROR: invalid input syntax for type integer: ""
CONTEXT: COPY x1, line 1, column a: ""

<para>
The <literal>list</literal> format does not distinguish a
<literal>NULL</literal>
value from an empty string. Empty lines are imported as empty strings, not
as <literal>NULL</literal> values.
</para>
we only mentioned import, not export (COPY TO) dealing with
NULL value.

+ if (c == '\n' || c == '\r')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("list format doesn't support newlines in field values"),
+ errhint("Consider using csv or text format for data containing newlines.")));

"list format doesn't support newlines in field values"
word list need single or double quote?

ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("Unsupported COPY format")));
should be "unsupported" per
https://www.postgresql.org/docs/current/error-style-guide.html#ERROR-STYLE-GUIDE-CASE

#22

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: jian he (#21)

Re: New "single" COPY format

On Saturday, November 9, 2024, jian he <jian.universality@gmail.com> wrote:

<para>
The <literal>list</literal> format does not distinguish a
<literal>NULL</literal>
value from an empty string. Empty lines are imported as empty strings,
not
as <literal>NULL</literal> values.
</para>
we only mentioned import, not export (COPY TO) dealing with
NULL value.

Yeah, while not being able to distinguish between the two is consistent
with the list format’s premise/design the choice would need to resolve to
the null value in order to continue to be data-type agnostic. We’d simply
have to note for the text types that empty strings in lists are not
supported, and if encountered will be resolved to a null value.

David J.

#23

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: jian he (#21)

3 attachment(s)

Re: New "single" COPY format

On Sun, Nov 10, 2024, at 05:26, jian he wrote:

On Sun, Nov 10, 2024 at 3:29 AM Joel Jacobson <joel@compiler.org> wrote:

Cool. I've drafted a new patch on this approach.
The list of newline-free built-in types is not exhaustive, yet.

do we care that COPY back and forth always work?

Yes, I think that's an important design goal.

doc not mentioned, but seems it's an implicit idea.

True, docs should be clear on this. Will update the docs
when we've decided what to do, see below.

copy the_table to '/tmp/3.txt' with (format whatever_format);
truncate the_table;
copy the_table from '/tmp/3.txt' with (format whatever_format);

but v20, will not work for an non-text column with SQL NULL data in it.

example:
drop table if exists x1;
create table x1(a int);
insert into x1 select null;
copy x1 to '/tmp/3.txt' with (format list);
copy x1 from '/tmp/3.txt' with (format list);
ERROR: invalid input syntax for type integer: ""
CONTEXT: COPY x1, line 1, column a: ""

<para>
The <literal>list</literal> format does not distinguish a
<literal>NULL</literal>
value from an empty string. Empty lines are imported as empty strings, not
as <literal>NULL</literal> values.
</para>
we only mentioned import, not export (COPY TO) dealing with
NULL value.

Nice catch.
Will respond to this in the later message in the thread from David.

+ if (c == '\n' || c == '\r')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("list format doesn't support newlines in field values"),
+ errhint("Consider using csv or text format for data containing newlines.")));

"list format doesn't support newlines in field values"
word list need single or double quote?

Fixed, to match other code.

I also change to \"list\" instead of 'list' everywhere in error messages,
since that seems much more popular in other existing PostgreSQL code.

Also changed wording to match the other error messages better:
-                    errmsg("list format doesn't support newlines in field values"),
+                    errmsg("COPY with format \"list\" doesn't support newlines in field values")));

Also removed this errhint since it seemed unnecessary.
- errhint("Consider using csv or text format for data containing newlines.")));

ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("Unsupported COPY format")));
should be "unsupported" per
https://www.postgresql.org/docs/current/error-style-guide.html#ERROR-STYLE-GUIDE-CASE

Fixed.

/Joel

Attachments:

v21-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patchapplication/octet-stream; name="=?UTF-8?Q?v21-0001-Introduce-CopyFormat-and-replace-csv=5Fmode-and-binar?= =?UTF-8?Q?y.patch?="Download

From 13b67cee37c737fc556c3dcf533895a698916926 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH 1/3] Introduce CopyFormat and replace csv_mode and binary
 fields with it.

---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      | 10 +++---
 src/backend/commands/copyfromparse.c | 34 +++++++++----------
 src/backend/commands/copyto.c        | 20 +++++------
 src/include/commands/copy.h          | 13 ++++++--
 src/tools/pgindent/typedefs.list     |  1 +
 6 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..b7e819de408 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+												opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV &&
+		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,8 +822,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY &&
+		opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..f350a4ff976 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		/*
 		 * If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryInputInfo(att->atttypid,
 								   &in_func_oid, &typioparams[attnum - 1]);
 		else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
 
 	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..51eb14d7432 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		{
 			int			fldnum;
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
 			else
 				fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		return false;
 
 	/* Parse the line into de-escaped field values */
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
 	else
 		fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
 	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		char	  **field_strings;
 		ListCell   *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 				continue;
 			}
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 			{
 				if (string == NULL &&
 					cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		quotec = cstate->opts.quote[0];
 		escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
-		if (cstate->opts.csv_mode)
+		if (cstate->opts.format == COPY_FORMAT_CSV)
 		{
 			/*
 			 * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \r */
-		if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
 					if (cstate->eol_type == EOL_CRNL)
 						ereport(ERROR,
 								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errmsg("literal carriage return found in data") :
 								 errmsg("unquoted carriage return found in data"),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errhint("Use \"\\r\" to represent carriage return.") :
 								 errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
 			else if (cstate->eol_type == EOL_NL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal carriage return found in data") :
 						 errmsg("unquoted carriage return found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\r\" to represent carriage return.") :
 						 errhint("Use quoted CSV field to represent carriage return.")));
 			/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \n */
-		if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal newline found in data") :
 						 errmsg("unquoted newline found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\n\" to represent newline.") :
 						 errhint("Use quoted CSV field to represent newline.")));
 			cstate->eol_type = EOL_NL;	/* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * Process backslash, except in CSV mode where backslash is a normal
 		 * character.
 		 */
-		if (c == '\\' && !cstate->opts.csv_mode)
+		if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
 		{
 			char		c2;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d96751..03c9d71d34a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 			{
 				/* Default line termination depends on platform */
 #ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
 			break;
 		case COPY_FRONTEND:
 			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 				CopySendChar(cstate, '\n');
 
 			/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
 		bool		isvarlena;
 		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryOutputInfo(attr->atttypid,
 									&out_func_oid,
 									&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate header for a binary copy */
 		int32		tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
 				else
 					CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate trailer for a binary copy */
 		CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Binary per-tuple header */
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		bool		need_delim = false;
 
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->opts.force_quote_flags[attnum - 1]);
 				else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..c3d1df267f0 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1847bbfa95c..d9ebfe6cb71 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromState
 CopyFromStateData
-- 
2.45.1

v21-0002-Add-COPY-format-list.patchapplication/octet-stream; name="=?UTF-8?Q?v21-0002-Add-COPY-format-list.patch?="Download

From 671075a67339653ebee584f6f7675a22fb609e48 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 14:35:40 +0100
Subject: [PATCH 2/3] Add COPY format 'list'

---
 doc/src/sgml/ref/copy.sgml           |  63 +++++++++-
 src/backend/commands/copy.c          |  86 +++++++++-----
 src/backend/commands/copyfrom.c      |   7 ++
 src/backend/commands/copyfromparse.c | 172 +++++++++++++++++++++++++--
 src/backend/commands/copyto.c        | 118 +++++++++++++++++-
 src/bin/psql/tab-complete.in.c       |   2 +-
 src/include/commands/copy.h          |   1 +
 src/test/regress/expected/copy.out   |  37 ++++++
 src/test/regress/expected/copy2.out  |  38 +++++-
 src/test/regress/sql/copy.sql        |  21 ++++
 src/test/regress/sql/copy2.sql       |  24 +++-
 11 files changed, 520 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..9327ec133bb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,8 +218,9 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      <para>
       Selects the data format to be read or written:
       <literal>text</literal>,
-      <literal>csv</literal> (Comma Separated Values),
-      or <literal>binary</literal>.
+      <literal>CSV</literal> (Comma Separated Values),
+      <literal>binary</literal>,
+      or <literal>list</literal>
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
@@ -257,7 +258,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +273,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is allowed only when using <literal>text</literal> or
+      <literal>CSV</literal> format.
      </para>
 
      <note>
@@ -294,7 +297,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      using <literal>text</literal> or <literal>CSV</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -400,7 +403,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
-      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+      when the <literal>FORMAT</literal> is <literal>text</literal>,
+      <literal>CSV</literal> or <literal>list</literal>.
      </para>
      <para>
       A <literal>NOTICE</literal> message containing the ignored row count is
@@ -893,6 +897,53 @@ COPY <replaceable class="parameter">count</replaceable>
 
   </refsect2>
 
+  <refsect2 id="sql-copy-list-format" xreflabel="List Format">
+   <title>List Format</title>
+
+   <para>
+    This format option is used for importing and exporting files containing
+    unstructured text, where each line is treated as a single field. It is
+    useful for data that does not conform to a structured, tabular format and
+    lacks delimiters.
+   </para>
+
+   <para>
+    In the <literal>list</literal> format, each line of the input or output is
+    considered a complete value without any field separation. There are no
+    field delimiters, and all characters are taken literally. There is no
+    special handling for quotes, backslashes, or escape sequences. All
+    characters, including whitespace and special characters, are preserved
+    exactly as they appear in the file. However, it's important to note that
+    the text is still interpreted according to the specified <literal>ENCODING</literal>
+    option or the current client encoding for input, and encoded using the
+    specified <literal>ENCODING</literal> or the current client encoding for output.
+   </para>
+
+   <para>
+    In <command>COPY TO</command>, the data must not contain any newlines or
+    carriage returns, as these characters are used to separate records. If such
+    characters are encountered in the data, an error will be thrown.
+   </para>
+
+   <para>
+    When using this format, the <command>COPY</command> command must specify
+    exactly one column. Specifying multiple columns will result in an error.
+    If the table has multiple columns and no column list is provided, an error
+    will occur.
+   </para>
+
+   <para>
+    The <literal>list</literal> format does not distinguish a <literal>NULL</literal>
+    value from an empty string. Empty lines are imported as empty strings, not
+    as <literal>NULL</literal> values.
+   </para>
+
+   <para>
+    Encoding works the same as in the <literal>text</literal> and <literal>CSV</literal> formats.
+   </para>
+
+  </refsect2>
+
   <refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
    <title>Binary Format</title>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b7e819de408..3b98a8e7db1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "list") == 0)
+				opts_out->format = COPY_FORMAT_LIST;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -681,23 +683,69 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->delim)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->null_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in LIST mode", "NULL")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_LIST && opts_out->default_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
+
+	if (opts_out->delim)
+	{
+		/* Only single-byte delimiter strings are supported. */
+		if (strlen(opts_out->delim) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY delimiter must be a single one-byte character")));
+
+		/* Disallow end-of-line characters */
+		if (strchr(opts_out->delim, '\r') != NULL ||
+			strchr(opts_out->delim, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter cannot be newline or carriage return")));
+	}
 	/* Set defaults for omitted options */
-	if (!opts_out->delim)
-		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->delim = ",";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->delim = "\t";
 
-	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
-	opts_out->null_print_len = strlen(opts_out->null_print);
+	if (opts_out->null_print)
+	{
+		if (strchr(opts_out->null_print, '\r') != NULL ||
+			strchr(opts_out->null_print, '\n') != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY null representation cannot use newline or carriage return")));
+
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+		opts_out->null_print = "";
+	else if (opts_out->format == COPY_FORMAT_TEXT)
+		opts_out->null_print = "\\N";
+
+	if (opts_out->null_print)
+		opts_out->null_print_len = strlen(opts_out->null_print);
 
 	if (opts_out->format == COPY_FORMAT_CSV)
 	{
@@ -707,25 +755,6 @@ ProcessCopyOptions(ParseState *pstate,
 			opts_out->escape = opts_out->quote;
 	}
 
-	/* Only single-byte delimiter strings are supported. */
-	if (strlen(opts_out->delim) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY delimiter must be a single one-byte character")));
-
-	/* Disallow end-of-line characters */
-	if (strchr(opts_out->delim, '\r') != NULL ||
-		strchr(opts_out->delim, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be newline or carriage return")));
-
-	if (strchr(opts_out->null_print, '\r') != NULL ||
-		strchr(opts_out->null_print, '\n') != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	if (opts_out->default_print)
 	{
 		opts_out->default_print_len = strlen(opts_out->default_print);
@@ -738,7 +767,7 @@ ProcessCopyOptions(ParseState *pstate,
 	}
 
 	/*
-	 * Disallow unsafe delimiter characters in non-CSV mode.  We can't allow
+	 * Disallow unsafe delimiter characters in text mode.  We can't allow
 	 * backslash because it would be ambiguous.  We can't allow the other
 	 * cases because data characters matching the delimiter must be
 	 * backslashed, and certain backslash combinations are interpreted
@@ -747,7 +776,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (opts_out->format != COPY_FORMAT_CSV &&
+	if (opts_out->format == COPY_FORMAT_TEXT &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -839,7 +868,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Don't allow the delimiter to appear in the null string. */
-	if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+	if (opts_out->delim && opts_out->null_print &&
+		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: %s is the name of a COPY option, e.g. NULL */
@@ -875,7 +905,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f350a4ff976..be0c54a86f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1438,6 +1438,13 @@ BeginCopyFrom(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for "list" format */
+	if (cstate->opts.format == COPY_FORMAT_LIST &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format \"list\" must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 51eb14d7432..f82fd4c1ed4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -7,7 +7,7 @@
  * formats.  The main entry point is NextCopyFrom(), which parses the
  * next input line and returns it as Datums.
  *
- * In text/CSV mode, the parsing happens in multiple stages:
+ * In text/CSV/list mode, the parsing happens in multiple stages:
  *
  * [data source] --> raw_buf --> input_buf --> line_buf --> attribute_buf
  *                1.          2.            3.           4.
@@ -28,7 +28,10 @@
  * 4. CopyReadAttributesText/CSV() function takes the input line from
  *    'line_buf', and splits it into fields, unescaping the data as required.
  *    The fields are stored in 'attribute_buf', and 'raw_fields' array holds
- *    pointers to each field.
+ *    pointers to each field. (text/csv modes only)
+ *
+ * In list mode, the fourth stage is skipped because the entire line is
+ * treated as a list field, making field splitting unnecessary.
  *
  * If encoding conversion is not required, a shortcut is taken in step 2 to
  * avoid copying the data unnecessarily.  The 'input_buf' pointer is set to
@@ -142,6 +145,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate);
 static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLineList(CopyFromState cstate);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -731,7 +735,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
 }
 
 /*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text, csv, or list mode.
  * Return false if no more lines.
  *
  * An internal temporary buffer is returned via 'fields'. It is valid until
@@ -747,7 +751,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	int			fldct;
 	bool		done;
 
-	/* only available for text or csv input */
+	/* only available for text, csv, or list input */
 	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
@@ -767,8 +771,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 
 			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
-			else
+			else if (cstate->opts.format == COPY_FORMAT_TEXT)
 				fldct = CopyReadAttributesText(cstate);
+			else
+			{
+				Assert(cstate->opts.format == COPY_FORMAT_LIST);
+				Assert(cstate->max_fields == 1);
+				/* Point raw_fields directly to line_buf data */
+				cstate->raw_fields[0] = cstate->line_buf.data;
+				fldct = 1;
+			}
 
 			if (fldct != list_length(cstate->attnumlist))
 				ereport(ERROR,
@@ -822,8 +834,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	/* Parse the line into de-escaped field values */
 	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
-	else
+	else if (cstate->opts.format == COPY_FORMAT_TEXT)
 		fldct = CopyReadAttributesText(cstate);
+	else
+	{
+		Assert(cstate->opts.format == COPY_FORMAT_LIST);
+		Assert(cstate->max_fields == 1);
+		/* Point raw_fields directly to line_buf data */
+		cstate->raw_fields[0] = cstate->line_buf.data;
+		fldct = 1;
+	}
 
 	*fields = cstate->raw_fields;
 	*nfields = fldct;
@@ -1095,7 +1115,10 @@ CopyReadLine(CopyFromState cstate)
 	cstate->line_buf_valid = false;
 
 	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate);
+	if (cstate->opts.format == COPY_FORMAT_LIST)
+		result = CopyReadLineList(cstate);
+	else
+		result = CopyReadLineText(cstate);
 
 	if (result)
 	{
@@ -1461,6 +1484,140 @@ CopyReadLineText(CopyFromState cstate)
 	return result;
 }
 
+/*
+ * CopyReadLineList - inner loop of CopyReadLine for list text mode
+ */
+static bool
+CopyReadLineList(CopyFromState cstate)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		need_data = false;
+	bool		hit_eof = false;
+	bool		result = false;
+
+	/*
+	 * The objective of this loop is to transfer the entire next input line
+	 * into line_buf. We only care for detecting newlines (\r and/or \n). All
+	 * other characters are treated as regular data.
+	 *
+	 * For speed, we try to move data from input_buf to line_buf in chunks
+	 * rather than one character at a time.  input_buf_ptr points to the next
+	 * character to examine; any characters from input_buf_index to
+	 * input_buf_ptr have been determined to be part of the line, but not yet
+	 * transferred to line_buf.
+	 *
+	 * For a little extra speed within the loop, we copy input_buf and
+	 * input_buf_len into local variables.
+	 */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	for (;;)
+	{
+		int			prev_raw_ptr;
+		char		c;
+
+		/*
+		 * Load more data if needed.
+		 */
+		if (input_buf_ptr >= copy_buf_len || need_data)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+			need_data = false;
+		}
+
+		/* OK to fetch a character */
+		prev_raw_ptr = input_buf_ptr;
+		c = copy_input_buf[input_buf_ptr++];
+
+		/* Process \r */
+		if (c == '\r')
+		{
+			/* Check for \r\n on first line, _and_ handle \r\n. */
+			if (cstate->eol_type == EOL_UNKNOWN ||
+				cstate->eol_type == EOL_CRNL)
+			{
+				/*
+				 * If need more data, go back to loop top to load it.
+				 *
+				 * Note that if we are at EOF, c will wind up as '\0' because
+				 * of the guaranteed pad of input_buf.
+				 */
+				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
+
+				/* get next char */
+				c = copy_input_buf[input_buf_ptr];
+
+				if (c == '\n')
+				{
+					input_buf_ptr++;	/* eat newline */
+					cstate->eol_type = EOL_CRNL;	/* in case not set yet */
+				}
+				else
+				{
+					if (cstate->eol_type == EOL_CRNL)
+						ereport(ERROR,
+								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+								 errmsg("end-of-copy marker does not match previous newline style")));
+
+					/*
+					 * if we got here, it is the first line and we didn't find
+					 * \n, so don't consume the peeked character
+					 */
+					cstate->eol_type = EOL_CR;
+				}
+			}
+			else if (cstate->eol_type == EOL_NL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* Process \n */
+		if (c == '\n')
+		{
+			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("end-of-copy marker does not match previous newline style")));
+			cstate->eol_type = EOL_NL;	/* in case not set yet */
+			/* If reach here, we have found the line terminator */
+			break;
+		}
+
+		/* All other characters are treated as regular data */
+	}							/* end of outer loop */
+
+	/*
+	 * Transfer any still-uncopied data to line_buf.
+	 */
+	REFILL_LINEBUF;
+
+	return result;
+}
+
+
 /*
  *	Return decimal value for a hexadecimal digit
  */
@@ -1937,7 +2094,6 @@ endfield:
 	return fieldno;
 }
 
-
 /*
  * Read a binary attribute
  */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34a..b0eed64e840 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -113,6 +113,7 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 								bool use_quote);
+static void CopyAttributeOutList(CopyToState cstate, const char *string, Oid typid);
 
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
@@ -574,6 +575,13 @@ BeginCopyTo(ParseState *pstate,
 	/* Generate or convert list of attributes to process */
 	cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
 
+	/* Enforce single column requirement for "list" format */
+	if (cstate->opts.format == COPY_FORMAT_LIST &&
+		list_length(cstate->attnumlist) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY with format \"list\" must specify exactly one column")));
+
 	num_phys_attrs = tupDesc->natts;
 
 	/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -830,17 +838,20 @@ DoCopyTo(CopyToState cstate)
 			{
 				int			attnum = lfirst_int(cur);
 				char	   *colname;
+				Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
 				if (hdr_delim)
 					CopySendChar(cstate, cstate->opts.delim[0]);
 				hdr_delim = true;
 
-				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+				colname = NameStr(attr->attname);
 
 				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
-				else
+				else if (cstate->opts.format == COPY_FORMAT_TEXT)
 					CopyAttributeOutText(cstate, colname);
+				else if (cstate->opts.format == COPY_FORMAT_LIST)
+					CopyAttributeOutList(cstate, colname, attr->atttypid);
 			}
 
 			CopySendEndOfRow(cstate);
@@ -921,7 +932,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (cstate->opts.format != COPY_FORMAT_BINARY)
+	if (cstate->opts.format == COPY_FORMAT_TEXT ||
+		cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		bool		need_delim = false;
 
@@ -949,7 +961,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -969,6 +981,37 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else if (cstate->opts.format == COPY_FORMAT_LIST)
+	{
+		int			attnum;
+		Datum		value;
+		bool		isnull;
+		Oid			typid;
+
+		/* Assert only one column is being copied */
+		Assert(list_length(cstate->attnumlist) == 1);
+
+		attnum = linitial_int(cstate->attnumlist);
+		value = slot->tts_values[attnum - 1];
+		isnull = slot->tts_isnull[attnum - 1];
+		typid = TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->atttypid;
+
+		if (!isnull)
+		{
+			char	   *string = OutputFunctionCall(&out_functions[attnum - 1],
+													value);
+
+			CopyAttributeOutList(cstate, string, typid);
+		}
+		/* For "list" format, we don't send anything for NULL values */
+	}
+	else
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("unsupported COPY format")));
+	}
+
 
 	CopySendEndOfRow(cstate);
 
@@ -1223,6 +1266,73 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	}
 }
 
+/*
+ * Send text representation of the attribute for "list" format.
+ * Scan for and error on newlines unless the type is known to be newline-free.
+ */
+static void
+CopyAttributeOutList(CopyToState cstate, const char *string, Oid typid)
+{
+	const char *ptr;
+	const char *start;
+	char		c;
+
+	if (cstate->need_transcoding)
+		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	else
+		ptr = string;
+
+	/* Fast path for some types that cannot contain newlines */
+	switch (typid)
+	{
+			/* Numeric types */
+		case INT2OID:
+		case INT4OID:
+		case INT8OID:
+		case FLOAT4OID:
+		case FLOAT8OID:
+		case NUMERICOID:
+		case OIDOID:
+			/* Date/time types */
+		case DATEOID:
+		case TIMEOID:
+		case TIMESTAMPOID:
+		case TIMESTAMPTZOID:
+		case INTERVALOID:
+			/* Network types */
+		case INETOID:
+		case CIDROID:
+		case MACADDROID:
+		case MACADDR8OID:
+			/* Other types */
+		case BOOLOID:
+		case UUIDOID:
+		case JSONBOID:
+			CopySendString(cstate, ptr);
+			return;
+	}
+
+	/*
+	 * Scan the string for newlines, and error if any are found.
+	 */
+	start = ptr;
+	while ((c = *ptr) != '\0')
+	{
+		if (c == '\n' || c == '\r')
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY with format \"list\" doesn't support newlines in field values")));
+
+		if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+			ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
+		else
+			ptr++;
+	}
+
+	/* If we got here, there were no newlines, so send the string */
+	CopySendString(cstate, start);
+}
+
 /*
  * copy_dest_startup --- executor startup
  */
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991d..75f312a9ac5 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "single");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f0..44e9934d630 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_LIST,
 } CopyFormat;
 
 /*
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84c..bff331792bc 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -325,3 +325,40 @@ SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY
 (2 rows)
 
 DROP TABLE parted_si;
+-- Test 'list' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format list);
+select col from single_copytest order by col collate "C";
+                  col                   
+----------------------------------------
+ bill    20      (11,10) 1000    sharon
+ sam     30      (10,5)  2000    bill
+ sharon  25      (15,12) 1000    sam
+(3 rows)
+
+copy single_copytest to stdout (format list);
+sharon	25	(15,12)	1000	sam
+sam	30	(10,5)	2000	bill
+bill	20	(11,10)	1000	sharon
+truncate single_copytest;
+copy single_copytest (col) from stdin (format list, header match);
+select col from single_copytest order by col collate "C";
+  col   
+--------
+ "def",
+ abc\.
+ ghi
+(3 rows)
+
+copy single_copytest (col) to stdout (format list, header);
+col
+abc\.
+"def",
+ghi
+truncate single_copytest;
+alter table single_copytest add column json_line jsonb;
+insert into single_copytest (json_line) values ('{"a": "b"}');
+copy single_copytest (json_line) to stdout (format list);
+{"a": "b"}
+drop table single_copytest;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..bad84d2ed5c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -90,6 +90,20 @@ COPY x from stdin (format BINARY, delimiter ',');
 ERROR:  cannot specify DELIMITER in BINARY mode
 COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
+COPY x (c) from stdin (format LIST, null 'x');
+ERROR:  cannot specify NULL in LIST mode
+COPY x from stdin (format TEXT, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format BINARY, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x (c) from stdin (format LIST, escape 'x');
+ERROR:  COPY ESCAPE requires CSV mode
+COPY x from stdin (format TEXT, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x from stdin (format BINARY, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, quote 'x');
+ERROR:  COPY QUOTE requires CSV mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
 COPY x from stdin (on_error unsupported);
@@ -100,6 +114,10 @@ COPY x from stdin (format TEXT, force_quote(a));
 ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format TEXT, force_quote *);
 ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, force_quote(a));
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+COPY x (c) from stdin (format LIST, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format CSV, force_quote(a));
 ERROR:  COPY FORCE_QUOTE cannot be used with COPY FROM
 COPY x from stdin (format CSV, force_quote *);
@@ -108,6 +126,10 @@ COPY x from stdin (format TEXT, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_not_null *);
 ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_not_null(a));
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
 COPY x to stdout (format CSV, force_not_null(a));
 ERROR:  COPY FORCE_NOT_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_not_null *);
@@ -116,6 +138,10 @@ COPY x from stdin (format TEXT, force_null(a));
 ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x from stdin (format TEXT, force_null *);
 ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_null(a));
+ERROR:  COPY FORCE_NULL requires CSV mode
+COPY x (c) from stdin (format LIST, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x to stdout (format CSV, force_null(a));
 ERROR:  COPY FORCE_NULL cannot be used with COPY TO
 COPY x to stdout (format CSV, force_null *);
@@ -858,9 +884,11 @@ select id, text_value, ts_value from copy_default;
 (2 rows)
 
 truncate copy_default;
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or list mode
 copy copy_default from stdin with (format binary, default '\D');
 ERROR:  cannot specify DEFAULT in BINARY mode
+copy copy_default (text_value) from stdin with (format list, default '\D');
+ERROR:  cannot specify DEFAULT in LIST mode
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
 ERROR:  COPY default representation cannot use newline or carriage return
@@ -929,3 +957,11 @@ truncate copy_default;
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
 ERROR:  COPY DEFAULT cannot be used with COPY TO
+-- Test list column requirement
+copy copy_default from stdin with (format list);
+ERROR:  COPY with format "list" must specify exactly one column
+-- Test error on newlines in list format
+create table copy_list_test (line text);
+insert into copy_list_test values (E'a\nb');
+copy copy_list_test to stdout with (format list);
+ERROR:  COPY with format "list" doesn't support newlines in field values
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b04..40394bca680 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -348,3 +348,24 @@ COPY parted_si(id, data) FROM :'filename';
 SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
 
 DROP TABLE parted_si;
+
+-- Test 'list' format
+\set filename :abs_srcdir '/data/emp.data'
+create temp table single_copytest (col text);
+copy single_copytest from :'filename' (format list);
+select col from single_copytest order by col collate "C";
+copy single_copytest to stdout (format list);
+truncate single_copytest;
+copy single_copytest (col) from stdin (format list, header match);
+col
+abc\.
+"def",
+ghi
+\.
+select col from single_copytest order by col collate "C";
+copy single_copytest (col) to stdout (format list, header);
+truncate single_copytest;
+alter table single_copytest add column json_line jsonb;
+insert into single_copytest (json_line) values ('{"a": "b"}');
+copy single_copytest (json_line) to stdout (format list);
+drop table single_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..b0aeb370163 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -72,18 +72,31 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
+COPY x (c) from stdin (format LIST, null 'x');
+COPY x from stdin (format TEXT, escape 'x');
+COPY x from stdin (format BINARY, escape 'x');
+COPY x (c) from stdin (format LIST, escape 'x');
+COPY x from stdin (format TEXT, quote 'x');
+COPY x from stdin (format BINARY, quote 'x');
+COPY x (c) from stdin (format LIST, quote 'x');
 COPY x from stdin (format BINARY, on_error ignore);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
+COPY x (c) from stdin (format LIST, force_quote(a));
+COPY x (c) from stdin (format LIST, force_quote *);
 COPY x from stdin (format CSV, force_quote(a));
 COPY x from stdin (format CSV, force_quote *);
 COPY x from stdin (format TEXT, force_not_null(a));
 COPY x from stdin (format TEXT, force_not_null *);
+COPY x (c) from stdin (format LIST, force_not_null(a));
+COPY x (c) from stdin (format LIST, force_not_null *);
 COPY x to stdout (format CSV, force_not_null(a));
 COPY x to stdout (format CSV, force_not_null *);
 COPY x from stdin (format TEXT, force_null(a));
 COPY x from stdin (format TEXT, force_null *);
+COPY x (c) from stdin (format LIST, force_null(a));
+COPY x (c) from stdin (format LIST, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
@@ -636,8 +649,9 @@ select id, text_value, ts_value from copy_default;
 
 truncate copy_default;
 
--- DEFAULT cannot be used in binary mode
+-- DEFAULT cannot be used in binary or list mode
 copy copy_default from stdin with (format binary, default '\D');
+copy copy_default (text_value) from stdin with (format list, default '\D');
 
 -- DEFAULT cannot be new line nor carriage return
 copy copy_default from stdin with (default E'\n');
@@ -707,3 +721,11 @@ truncate copy_default;
 
 -- DEFAULT cannot be used in COPY TO
 copy (select 1 as test) TO stdout with (default '\D');
+
+-- Test list column requirement
+copy copy_default from stdin with (format list);
+
+-- Test error on newlines in list format
+create table copy_list_test (line text);
+insert into copy_list_test values (E'a\nb');
+copy copy_list_test to stdout with (format list);
-- 
2.45.1

v21-0003-Reorganize-option-validations.patchapplication/octet-stream; name="=?UTF-8?Q?v21-0003-Reorganize-option-validations.patch?="Download

From 7986bc8b65f743dd172e81b3f6216414079d8350 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 7 Nov 2024 15:53:24 +0100
Subject: [PATCH 3/3] Reorganize option validations

---
 src/backend/commands/copy.c | 460 ++++++++++++++++++++----------------
 1 file changed, 259 insertions(+), 201 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3b98a8e7db1..2de9bc0be8e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -673,44 +673,33 @@ ProcessCopyOptions(ParseState *pstate,
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	/*
-	 * Check for incompatible options (must do these three before inserting
-	 * defaults)
-	 */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in LIST mode", "NULL")));
-
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
-
-	if (opts_out->format == COPY_FORMAT_LIST && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
-
+	/* --- FREEZE option --- */
+	if (opts_out->freeze)
+	{
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FREEZE",
+							"COPY TO")));
+	}
+
+	/* --- DELIMITER option --- */
 	if (opts_out->delim)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in LIST mode", "DELIMITER")));
+
 		/* Only single-byte delimiter strings are supported. */
 		if (strlen(opts_out->delim) != 1)
 			ereport(ERROR,
@@ -723,22 +712,53 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY delimiter cannot be newline or carriage return")));
+
+		if (opts_out->format == COPY_FORMAT_TEXT)
+		{
+			/*
+			 * Disallow unsafe delimiter characters in text mode.  We can't
+			 * allow backslash because it would be ambiguous.  We can't allow
+			 * the other cases because data characters matching the delimiter
+			 * must be backslashed, and certain backslash combinations are
+			 * interpreted non-literally by COPY IN.  Disallowing all lower
+			 * case ASCII letters is more than strictly necessary, but seems
+			 * best for consistency and future-proofing.  Likewise we disallow
+			 * all digits though only octal digits are actually dangerous.
+			 */
+			if (strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
+					   opts_out->delim[0]) != NULL)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
+		}
 	}
-	/* Set defaults for omitted options */
+	/* Set default delimiter */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->delim = ",";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
 		opts_out->delim = "\t";
 
+	/* --- NULL option --- */
 	if (opts_out->null_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "NULL")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in LIST mode", "NULL")));
+
+		/* Disallow end-of-line characters */
 		if (strchr(opts_out->null_print, '\r') != NULL ||
 			strchr(opts_out->null_print, '\n') != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY null representation cannot use newline or carriage return")));
-
 	}
+	/* Set default null_print */
 	else if (opts_out->format == COPY_FORMAT_CSV)
 		opts_out->null_print = "";
 	else if (opts_out->format == COPY_FORMAT_TEXT)
@@ -747,16 +767,23 @@ ProcessCopyOptions(ParseState *pstate,
 	if (opts_out->null_print)
 		opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->format == COPY_FORMAT_CSV)
-	{
-		if (!opts_out->quote)
-			opts_out->quote = "\"";
-		if (!opts_out->escape)
-			opts_out->escape = opts_out->quote;
-	}
-
+	/* --- DEFAULT option --- */
 	if (opts_out->default_print)
 	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+
+		if (opts_out->format == COPY_FORMAT_LIST)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify %s in LIST mode", "DEFAULT")));
+
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
 		opts_out->default_print_len = strlen(opts_out->default_print);
 
 		if (strchr(opts_out->default_print, '\r') != NULL ||
@@ -764,138 +791,7 @@ ProcessCopyOptions(ParseState *pstate,
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("COPY default representation cannot use newline or carriage return")));
-	}
 
-	/*
-	 * Disallow unsafe delimiter characters in text mode.  We can't allow
-	 * backslash because it would be ambiguous.  We can't allow the other
-	 * cases because data characters matching the delimiter must be
-	 * backslashed, and certain backslash combinations are interpreted
-	 * non-literally by COPY IN.  Disallowing all lower case ASCII letters is
-	 * more than strictly necessary, but seems best for consistency and
-	 * future-proofing.  Likewise we disallow all digits though only octal
-	 * digits are actually dangerous.
-	 */
-	if (opts_out->format == COPY_FORMAT_TEXT &&
-		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
-			   opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
-
-	/* Check header */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
-
-	/* Check quote */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "QUOTE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY quote must be a single one-byte character")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY delimiter and quote must be different")));
-
-	/* Check escape */
-	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
-
-	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("COPY escape must be a single one-byte character")));
-
-	/* Check force_quote */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
-												opts_out->force_quote_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
-	if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
-						"COPY FROM")));
-
-	/* Check force_notnull */
-	if (opts_out->format != COPY_FORMAT_CSV &&
-		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
-	if ((opts_out->force_notnull != NIL || opts_out->force_notnull_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
-						"COPY TO")));
-
-	/* Check force_null */
-	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
-												opts_out->force_null_all))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
-
-	if ((opts_out->force_null != NIL || opts_out->force_null_all) &&
-		!is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
-						"COPY TO")));
-
-	/* Don't allow the delimiter to appear in the null string. */
-	if (opts_out->delim && opts_out->null_print &&
-		strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("COPY delimiter character must not appear in the %s specification",
-						"NULL")));
-
-	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->format == COPY_FORMAT_CSV &&
-		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: %s is the name of a COPY option, e.g. NULL */
-				 errmsg("CSV quote character must not appear in the %s specification",
-						"NULL")));
-
-	/* Check freeze */
-	if (opts_out->freeze && !is_from)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
-		 second %s is a COPY with direction, e.g. COPY TO */
-				 errmsg("COPY %s cannot be used with %s", "FREEZE",
-						"COPY TO")));
-
-	if (opts_out->default_print)
-	{
 		if (!is_from)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -905,22 +801,13 @@ ProcessCopyOptions(ParseState *pstate,
 							"COPY TO")));
 
 		/* Don't allow the delimiter to appear in the default string. */
-		if (opts_out->delim && strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
+		if (strchr(opts_out->default_print, opts_out->delim[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			/*- translator: %s is the name of a COPY option, e.g. NULL */
 					 errmsg("COPY delimiter character must not appear in the %s specification",
 							"DEFAULT")));
 
-		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->format == COPY_FORMAT_CSV &&
-			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			/*- translator: %s is the name of a COPY option, e.g. NULL */
-					 errmsg("CSV quote character must not appear in the %s specification",
-							"DEFAULT")));
-
 		/* Don't allow the NULL and DEFAULT string to be the same */
 		if (opts_out->null_print_len == opts_out->default_print_len &&
 			strncmp(opts_out->null_print, opts_out->default_print,
@@ -929,20 +816,191 @@ ProcessCopyOptions(ParseState *pstate,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
-	/* Check on_error */
-	if (opts_out->format == COPY_FORMAT_BINARY &&
-		opts_out->on_error != COPY_ON_ERROR_STOP)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
-	if (opts_out->reject_limit && !opts_out->on_error)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-		/*- translator: first and second %s are the names of COPY option, e.g.
-		 * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
-				 errmsg("COPY %s requires %s to be set to %s",
-						"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	else
+	{
+		/* No default for default_print; remains NULL */
+	}
+
+	/* --- HEADER option --- */
+	if (opts_out->header_line != COPY_HEADER_FALSE)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("cannot specify %s in BINARY mode", "HEADER")));
+	}
+	else
+	{
+		/* Default is no header; no action needed */
+	}
+
+	/* --- QUOTE option --- */
+	if (opts_out->quote)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "QUOTE")));
+
+		if (strlen(opts_out->quote) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY quote must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default quote */
+		opts_out->quote = "\"";
+	}
+
+	/* --- ESCAPE option --- */
+	if (opts_out->escape)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "ESCAPE")));
+
+		if (strlen(opts_out->escape) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape must be a single one-byte character")));
+	}
+	else if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Set default escape to quote character */
+		opts_out->escape = opts_out->quote;
+	}
+
+	/* --- FORCE_QUOTE option --- */
+	if (opts_out->force_quote != NIL || opts_out->force_quote_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_QUOTE")));
+
+		if (is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_QUOTE",
+							"COPY FROM")));
+	}
+
+	/* --- FORCE_NOT_NULL option --- */
+	if (opts_out->force_notnull != NIL || opts_out->force_notnull_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NOT_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NOT_NULL",
+							"COPY TO")));
+	}
+
+	/* --- FORCE_NULL option --- */
+	if (opts_out->force_null != NIL || opts_out->force_null_all)
+	{
+		if (opts_out->format != COPY_FORMAT_CSV)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					 errmsg("COPY %s requires CSV mode", "FORCE_NULL")));
+
+		if (!is_from)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR,
+			second %s is a COPY with direction, e.g. COPY TO */
+					 errmsg("COPY %s cannot be used with %s", "FORCE_NULL",
+							"COPY TO")));
+	}
+
+	/* --- ON_ERROR option --- */
+	if (opts_out->on_error != COPY_ON_ERROR_STOP)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+	}
+
+	/* --- REJECT_LIMIT option --- */
+	if (opts_out->reject_limit)
+	{
+		if (opts_out->on_error != COPY_ON_ERROR_IGNORE)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: first and second %s are the names of COPY option, e.g.
+				* ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+					 errmsg("COPY %s requires %s to be set to %s",
+							"REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+	}
+
+	/*
+	 * Additional checks for interdependent options
+	 */
+
+	/* Checks specific to the CSV and TEXT formats */
+	if (opts_out->format == COPY_FORMAT_TEXT ||
+		opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the delimiter to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->delim[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("COPY delimiter character must not appear in the %s specification",
+							"NULL")));
+	}
+
+	/* Checks specific to the CSV format */
+	if (opts_out->format == COPY_FORMAT_CSV)
+	{
+		/* Assert options have been set (defaults applied if not specified) */
+		Assert(opts_out->delim);
+		Assert(opts_out->quote);
+		Assert(opts_out->null_print);
+
+		/* Don't allow the CSV quote char to appear in the default string. */
+		if (opts_out->default_print_len > 0 &&
+			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"DEFAULT")));
+
+		if (opts_out->delim[0] == opts_out->quote[0])
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY delimiter and quote must be different")));
+
+		/* Don't allow the CSV quote char to appear in the null string. */
+		if (strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			/*- translator: %s is the name of a COPY option, e.g. NULL */
+					 errmsg("CSV quote character must not appear in the %s specification",
+							"NULL")));
+	}
 }
 
 /*
-- 
2.45.1

#24

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: David G. Johnston (#22)

Re: New "single" COPY format

On Sun, Nov 10, 2024, at 05:55, David G. Johnston wrote:

On Saturday, November 9, 2024, jian he <jian.universality@gmail.com> wrote:

<para>
The <literal>list</literal> format does not distinguish a
<literal>NULL</literal>
value from an empty string. Empty lines are imported as empty strings, not
as <literal>NULL</literal> values.
</para>
we only mentioned import, not export (COPY TO) dealing with
NULL value.

Yeah, while not being able to distinguish between the two is consistent
with the list format’s premise/design the choice would need to resolve
to the null value in order to continue to be data-type agnostic. We’d
simply have to note for the text types that empty strings in lists are
not supported, and if encountered will be resolved to a null value.

Seems like we have two options to decide between, both with pros and cons.

For full reversibility, we can't support both NULL values and the empty string.

To make a sound design decision here, I think we should test both options
against all real-world use-cases we can come up with.

The use-cases I can think of are:

1) Arbitrary unstructured text lists, where each line could be any text string
2) JSONL, where each line is a valid JSON value, and cannot be an empty string

Option A:
COPY TO: Empty string field gets exported as an empty line. NULL field is an error.
COPY FROM: Empty line is imported as an empty string.

Option B:
COPY TO: NULL field gets exported as an empty line. Empty string field is an error.
COPY FROM: Empty line is imported as a NULL value.

I think Option A seems more useful, because:

1) Arbitrary text files, very often contain empty lines to separate sections from each other.
2) JSONL cannot contain empty lines, they are an error: https://jsonlines.org/validator/

Nothing implemented yet, awaiting opinions.

/Joel

[1]: https://jsonlines.org/validator/

#25

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Joel Jacobson (#24)

Re: New "single" COPY format

On Sun, Nov 10, 2024, at 08:32, Joel Jacobson wrote:

Option A:
COPY TO: Empty string field gets exported as an empty line. NULL field
is an error.
COPY FROM: Empty line is imported as an empty string.

Option B:
COPY TO: NULL field gets exported as an empty line. Empty string field
is an error.
COPY FROM: Empty line is imported as a NULL value.

I think Option A seems more useful, because:

1) Arbitrary text files, very often contain empty lines to separate
sections from each other.
2) JSONL cannot contain empty lines, they are an error:
https://jsonlines.org/validator/

David, I forgot about your NUL idea, so there is also a third option.

Option C:
COPY TO: NULL field gets exported as a the NUL byte. Empty string field is an empty line.
COPY FROM: Empty line is imported as an empty string. NUL byte is imported as a NULL value.

For arbitrary text files, Option C would work fine, since they usually don't contain NUL bytes, and if they do, then it seems useful to be handle to deal with such files in some way, even if it can't be known NUL always means NULL, then we could at least import such files, and then do some post-processing of the imported data, to get the desired result.

For JSONL, Option C would also work fine, since they can't contain NUL bytes.

It's a bit of a hack, but I kinda like it., since it seems like the only option without an error situation.
Maybe OK if we add one a cautionary <note> the docs?

/Joel

#26

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Joel Jacobson (#25)

Re: New "single" COPY format

On Sun, Nov 10, 2024, at 08:48, Joel Jacobson wrote:

On Sun, Nov 10, 2024, at 08:32, Joel Jacobson wrote:

Option A:
COPY TO: Empty string field gets exported as an empty line. NULL field
is an error.
COPY FROM: Empty line is imported as an empty string.

Option B:
COPY TO: NULL field gets exported as an empty line. Empty string field
is an error.
COPY FROM: Empty line is imported as a NULL value.

I think Option A seems more useful, because:

1) Arbitrary text files, very often contain empty lines to separate
sections from each other.
2) JSONL cannot contain empty lines, they are an error:
https://jsonlines.org/validator/

David, I forgot about your NUL idea, so there is also a third option.

To avoid confusion, I should have been clear that the below idea
is just based on your NUL idea, it's not the same idea per se,
since your was about newline handling in textual types.

Option C:
COPY TO: NULL field gets exported as a the NUL byte. Empty string field
is an empty line.
COPY FROM: Empty line is imported as an empty string. NUL byte is
imported as a NULL value.

For arbitrary text files, Option C would work fine, since they usually
don't contain NUL bytes, and if they do, then it seems useful to be
handle to deal with such files in some way, even if it can't be known
NUL always means NULL, then we could at least import such files, and
then do some post-processing of the imported data, to get the desired
result.

For JSONL, Option C would also work fine, since they can't contain NUL bytes.

It's a bit of a hack, but I kinda like it., since it seems like the
only option without an error situation.
Maybe OK if we add one a cautionary <note> the docs?

Also, we could emit NOTICE messages,
upon both COPY TO and COPY FROM,
to increase the chances of users understanding the semantics:

COPY TO:
NOTICE: NULL values encountered in data, represented as NUL bytes in output

COPY FROM:
NOTICE: NUL bytes encountered in data, stored as NULL values

/Joel

#27

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Joel Jacobson (#26)

Re: New "single" COPY format

On Sun, Nov 10, 2024, at 09:00, Joel Jacobson wrote:

It's a bit of a hack, but I kinda like it., since it seems like the
only option without an error situation.

I forgot about the error situation when a textual contain
newline characters, that remains the same for option A, B and C.

/Joel

#28

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Joel Jacobson (#27)

Re: New "single" COPY format

Hi hackers,

After further consideration, I'm withdrawing the patch.
Some fundamental questions remain unresolved:

- Should round-trip fidelity be a strict goal? By "round-trip fidelity",
I mean that data exported and then re-imported should yield exactly
the original values, including the distinction between NULL and empty strings.
- If round-trip fidelity is a requirement, how do we distinguish NULL from empty
strings without delimiters or escapes?
- Is automatic newline detection (as in "csv" and "text") more valuable than
the ability to embed \r (CR) characters?
- Would it be better to extend the existing COPY options rather than introducing
a new format?
- Or should we consider a JSONL format instead, one that avoids the NULL/empty
string problem entirely?

No clear solution or consensus has emerged. For now, I'll step back from the
proposal. If someone wants to revisit this later, I'd be happy to contribute.

Thanks again for all the feedback and consideration.

/Joel

#29

jian he

jian.universality@gmail.com

about 1 year ago

In reply to: Joel Jacobson (#28)

Re: New "single" COPY format

I have reviewed v21-0001 again.

v21-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
is a good refactor.

overall looks good to me.

#30

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: jian he (#29)

Re: New "single" COPY format

On Thu, Dec 19, 2024, at 07:48, jian he wrote:

I have reviewed v21-0001 again.

v21-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
is a good refactor.

overall looks good to me.

OK, I could submit it as a separate patch.

Would we also want the reorganization of existing copy option validations?
That is, v21-0003-Reorganize-option-validations.patch minus v21-0002-Add-COPY-format-list.patch.
Nice that you managed to arrange them in the same order as in the documentation. Made the code easier to follow.

/Joel

#31

Andrew Dunstan

andrew@dunslane.net

about 1 year ago

In reply to: Joel Jacobson (#28)

Re: New "single" COPY format

On 2024-12-16 Mo 10:09 AM, Joel Jacobson wrote:

Hi hackers,

After further consideration, I'm withdrawing the patch.
Some fundamental questions remain unresolved:

- Should round-trip fidelity be a strict goal? By "round-trip fidelity",
I mean that data exported and then re-imported should yield exactly
the original values, including the distinction between NULL and empty strings.
- If round-trip fidelity is a requirement, how do we distinguish NULL from empty
strings without delimiters or escapes?
- Is automatic newline detection (as in "csv" and "text") more valuable than
the ability to embed \r (CR) characters?
- Would it be better to extend the existing COPY options rather than introducing
a new format?
- Or should we consider a JSONL format instead, one that avoids the NULL/empty
string problem entirely?

No clear solution or consensus has emerged. For now, I'll step back from the
proposal. If someone wants to revisit this later, I'd be happy to contribute.

Thanks again for all the feedback and consideration.

We seem to have got seriously into the weeds, here. I'd be sorry to see
this dropped. After all, it's not something new, and while we have a
sort of workaround for "one json doc per line" it's far from obvious,
and except in a few blog posts undocumented.

I think we're trying to be far too general here but in the absence of
more general use cases. The ones I recall having encountered in the wild
are:

. one json datum per line

. one json document per file

. a sequence of json documents per file

The last one is hard to deal with, and I think I've only seen it once or
twice, so I suggest leaving it aside for now.

Notice these are all JSON. I could imagine XML might have similar
requirements, but I encounter it extremely rarely.

Regarding NULL, an empty string is not a valid JSON literal, so there
should be no confusion there. It is valid for XML, though.

Given all that I think restricting ourselves to just the JSON cases, and
possibly just to JSONL, would be perfectly reasonable.

Regarding CR, it's not a valid character in a JSON string item, although
it is valid in JSON whitespace. I would not treat it as magical unless
it immediately precedes an NL. That gives rise to a very sight
ambiguity, but I think it's one we could live with.

As for what the format is called, I don't like the "LIST" proposal much,
even for the general case. Seems too close to an array.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#32

Joel Jacobson

joel@compiler.org

about 1 year ago

In reply to: Andrew Dunstan (#31)

Re: New "single" COPY format

On Thu, Dec 19, 2024, at 14:40, Andrew Dunstan wrote:

We seem to have got seriously into the weeds, here. I'd be sorry to see
this dropped. After all, it's not something new, and while we have a
sort of workaround for "one json doc per line" it's far from obvious,
and except in a few blog posts undocumented.

I think we're trying to be far too general here but in the absence of
more general use cases. The ones I recall having encountered in the wild
are:

. one json datum per line

. one json document per file

. a sequence of json documents per file

The last one is hard to deal with, and I think I've only seen it once or
twice, so I suggest leaving it aside for now.

Notice these are all JSON. I could imagine XML might have similar
requirements, but I encounter it extremely rarely.

Regarding NULL, an empty string is not a valid JSON literal, so there
should be no confusion there. It is valid for XML, though.

Given all that I think restricting ourselves to just the JSON cases, and
possibly just to JSONL, would be perfectly reasonable.

Regarding CR, it's not a valid character in a JSON string item, although
it is valid in JSON whitespace. I would not treat it as magical unless
it immediately precedes an NL. That gives rise to a very sight
ambiguity, but I think it's one we could live with.

As for what the format is called, I don't like the "LIST" proposal much,
even for the general case. Seems too close to an array.

On Thu, Dec 19, 2024, at 14:40, Andrew Dunstan wrote:

We seem to have got seriously into the weeds, here. I'd be sorry to see
this dropped. After all, it's not something new, and while we have a
sort of workaround for "one json doc per line" it's far from obvious,
and except in a few blog posts undocumented.

I think we're trying to be far too general here but in the absence of
more general use cases. The ones I recall having encountered in the wild
are:

. one json datum per line

. one json document per file

. a sequence of json documents per file

The last one is hard to deal with, and I think I've only seen it once or
twice, so I suggest leaving it aside for now.

Notice these are all JSON. I could imagine XML might have similar
requirements, but I encounter it extremely rarely.

Regarding NULL, an empty string is not a valid JSON literal, so there
should be no confusion there. It is valid for XML, though.

Given all that I think restricting ourselves to just the JSON cases, and
possibly just to JSONL, would be perfectly reasonable.

Regarding CR, it's not a valid character in a JSON string item, although
it is valid in JSON whitespace. I would not treat it as magical unless
it immediately precedes an NL. That gives rise to a very sight
ambiguity, but I think it's one we could live with.

As for what the format is called, I don't like the "LIST" proposal much,
even for the general case. Seems too close to an array.

Thanks for weighing in.

OK, let's try to restrict ourselves to dealing with json lines and see
if we can work out the precise semantics.

The JSONL spec, or at least its validator [1]https://jsonlines.org/validator/, forbid empty lines.

So we would need to extend the JSONL spec to:
- Export a NULL::jsonb column value as an empty line.
- Import an empty line as a NULL::jsonb column value.

Could we also restrict ourselves to PostgreSQL's jsonb type?
I fear trying to also support the PostgreSQL's json type will be another
rabbit hole, since json values can contain LF characters in whitespace.

Due to the necessary "empty line = NULL" extension,
I think we should simply call the format 'jsonb', and not 'jsonl'.

How about:

COPY table_name [ ( column_name ) ] { FROM | TO } 'filename' (FORMAT jsonb);

- If column list is omitted, table_name must have exactly one column.
- If column list is specified, it must be of length one.
- The column type must be jsonb.
- Each line is a single jsonb value; no multi-line json permitted.
- Non-LF whitespace, i.e. [ \r\t], are allowed anywhere where
whitespace can exist according to the json spec, but are discarded
upon import, since jsonb doesn't store whitepsace.
- The LF character, i.e. [\n], determine the end of a jsonb value.
- Empty line are imported as NULL::jsonb values
- NULL::jsonb values are exported as empty lines

Note: Lines can end with LR or CR+LF, since the CR is just whitespace,
so even e.g. CR+CR+LF at the end would be allowed.

Naming the new COPY format 'jsonb' would signal that we're firmly
in PostgreSQL territory, not trying to produce a "pure" JSONL file.

All JSONL files can be imported though, since the "empty line = NULL" extension
just risks break other consumers of JSONL.
If that's a problem, users can just filter out NULL values upon export,
which seems acceptable to me. Just seems important to not call it JSONL.

/Joel

[1]: https://jsonlines.org/validator/

#33

David G. Johnston

david.g.johnston@gmail.com

11 months ago

In reply to: Joel Jacobson (#32)

Re: New "single" COPY format

On Sat, Dec 21, 2024 at 1:57 AM Joel Jacobson <joel@compiler.org> wrote:

How about:

COPY table_name [ ( column_name ) ] { FROM | TO } 'filename' (FORMAT
jsonb);

- If column list is omitted, table_name must have exactly one column.
- If column list is specified, it must be of length one.
- The column type must be jsonb.
- Each line is a single jsonb value; no multi-line json permitted.
- Non-LF whitespace, i.e. [ \r\t], are allowed anywhere where
whitespace can exist according to the json spec, but are discarded
upon import, since jsonb doesn't store whitepsace.
- The LF character, i.e. [\n], determine the end of a jsonb value.
- Empty line are imported as NULL::jsonb values
- NULL::jsonb values are exported as empty lines

Note: Lines can end with LR or CR+LF, since the CR is just whitespace,
so even e.g. CR+CR+LF at the end would be allowed.

My first impression of this is positive. I like the scoping and the rules
make sense.

I know this is outside of COPY charter as it stands today but I'd like to
suggest allowing for some way of saying "please put input line numbers into
this column". Think of it as adding "with ordinality" to copy. In
particular it would help to deal with potential end-of-file situations
where the last line is imported as a null value.

David J.

#34

Andrew Dunstan

andrew@dunslane.net

11 months ago

In reply to: David G. Johnston (#33)

Re: New "single" COPY format

On 2025-02-17 Mo 7:05 PM, David G. Johnston wrote:

On Sat, Dec 21, 2024 at 1:57 AM Joel Jacobson <joel@compiler.org> wrote:

How about:

COPY table_name [ ( column_name ) ] { FROM | TO } 'filename'
(FORMAT jsonb);

- If column list is omitted, table_name must have exactly one column.
- If column list is specified, it must be of length one.
- The column type must be jsonb.
- Each line is a single jsonb value; no multi-line json permitted.
- Non-LF whitespace, i.e. [ \r\t], are allowed anywhere where
whitespace can exist according to the json spec, but are discarded
upon import, since jsonb doesn't store whitepsace.
- The LF character, i.e. [\n], determine the end of a jsonb value.
- Empty line are imported as NULL::jsonb values
- NULL::jsonb values are exported as empty lines

Note: Lines can end with LR or CR+LF, since the CR is just whitespace,
so even e.g. CR+CR+LF at the end would be allowed.

I'd tweak this a bit:

. call the format jsonl

. allow json - on output I would just eat any LF, but I could per
persuaded to do something else.

My first impression of this is positive. I like the scoping and the
rules make sense.

I know this is outside of COPY charter as it stands today but I'd like
to suggest allowing for some way of saying "please put input line
numbers into this column". Think of it as adding "with ordinality" to
copy. In particular it would help to deal with potential end-of-file
situations where the last line is imported as a null value.

Let's not add feature creep. If you want a feature like this there's no
reason it should belong only to JSON input.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com