Emitting JSON to file using COPY TO

david.g.johnston@gmail.com

about 2 years ago

In reply to: Davin Shearer (#1)

Re: Emitting JSON to file using COPY TO

On Sat, Nov 25, 2023 at 12:22 PM Davin Shearer <scholarsmate@gmail.com>
wrote:

Is there a way to emit JSON results to file from within postgres?

Use psql to directly output query results to a file instead of using COPY
to output structured output in a format you don't want.

David J.

Adrian Klaver

adrian.klaver@aklaver.com

about 2 years ago

In reply to: Davin Shearer (#1)

Re: Emitting JSON to file using COPY TO

On 11/25/23 11:21, Davin Shearer wrote:

Hello!

I'm trying to emit a JSON aggregation of JSON rows to a file using COPY
TO, but I'm running into problems with COPY TO double quoting the
output. Here is a minimal example that demonstrates the problem I'm
having:

I have tried to get COPY TO to copy the results to file "as-is" by
setting the escape and the quote characters to the empty string (''),
but they only apply to the CSV format.

Is there a way to emit JSON results to file from within postgres?
Effectively, nn "as-is" option to COPY TO would work well for this JSON
use case.

Not using COPY.

See David Johnson's post for one way using the client psql.

Otherwise you will need to use any of the many ETL programs out there
that are designed for this sort of thing.

Any assistance would be appreciated.

Thanks,
Davin

--
Adrian Klaver
adrian.klaver@aklaver.com

ddevienne@gmail.com

about 2 years ago

In reply to: Adrian Klaver (#3)

Re: Emitting JSON to file using COPY TO

On Sat, Nov 25, 2023 at 10:00 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 11/25/23 11:21, Davin Shearer wrote:

Hello!

I'm trying to emit a JSON aggregation of JSON rows to a file using COPY
TO, but I'm running into problems with COPY TO double quoting the
output. Here is a minimal example that demonstrates the problem I'm
having:

I have tried to get COPY TO to copy the results to file "as-is" by
setting the escape and the quote characters to the empty string (''),
but they only apply to the CSV format.

Is there a way to emit JSON results to file from within postgres?
Effectively, nn "as-is" option to COPY TO would work well for this JSON
use case.

Not using COPY.

See David Johnson's post for one way using the client psql.

Otherwise you will need to use any of the many ETL programs out there
that are designed for this sort of thing.

Guys, I don't get answers like that. The JSON spec is clear:

Show quoted text

ddevienne@gmail.com

about 2 years ago

In reply to: Dominique Devienne (#4)

Re: Emitting JSON to file using COPY TO

On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <ddevienne@gmail.com>
wrote:

On Sat, Nov 25, 2023 at 10:00 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 11/25/23 11:21, Davin Shearer wrote:

Hello!

I'm trying to emit a JSON aggregation of JSON rows to a file using COPY
TO, but I'm running into problems with COPY TO double quoting the
output. Here is a minimal example that demonstrates the problem I'm
having:

I have tried to get COPY TO to copy the results to file "as-is" by
setting the escape and the quote characters to the empty string (''),
but they only apply to the CSV format.

Is there a way to emit JSON results to file from within postgres?
Effectively, nn "as-is" option to COPY TO would work well for this JSON
use case.

Not using COPY.

See David Johnson's post for one way using the client psql.

Otherwise you will need to use any of the many ETL programs out there
that are designed for this sort of thing.

Guys, I don't get answers like that. The JSON spec is clear:

Oops, sorry, user error. --DD

PS: The JSON spec is a bit ambiguous. First it says

Any codepoint except " or \ or control characters

And then is clearly shows \" as a valid sequence...
Sounds like JQ is too restrictive?

Or that's the double-escape that's the culprit?
i.e. \\ is in the final text, so that's just a backslash,
and then the double-quote is no longer escaped.

I've recently noticed json_agg(row_to_json(t))
is equivalent to json_agg(t)

Maybe use that instead? Does that make a difference?

I haven't noticed wrong escaping of double-quotes yet,
but then I'm using the binary mode of queries. Perhaps that matters.

On second thought, I guess that's COPY in its text modes doing the escaping?
Interesting. The text-based modes of COPY are configurable. There's even a
JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and avoid
the escaping?

david.g.johnston@gmail.com

about 2 years ago

In reply to: Dominique Devienne (#5)

Re: Emitting JSON to file using COPY TO

On Monday, November 27, 2023, Dominique Devienne <ddevienne@gmail.com>
wrote:

There's even a JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and avoid
the escaping?

I agree there should be a copy option for “not formatted” so if you dump a
single column result in that format you get the raw unescaped contents of
the column. As soon as you ask for a format your json is now embedded so it
is a value within another format and any structural aspects of the wrapper
present in the json text representation need to be escaped.

David J.

Pavel Stehule

pavel.stehule@gmail.com

about 2 years ago

In reply to: David G. Johnston (#6)

Re: Emitting JSON to file using COPY TO

po 27. 11. 2023 v 14:27 odesílatel David G. Johnston <
david.g.johnston@gmail.com> napsal:

On Monday, November 27, 2023, Dominique Devienne <ddevienne@gmail.com>
wrote:

There's even a JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and
avoid the escaping?

I agree there should be a copy option for “not formatted” so if you dump a
single column result in that format you get the raw unescaped contents of
the column. As soon as you ask for a format your json is now embedded so it
is a value within another format and any structural aspects of the wrapper
present in the json text representation need to be escaped.

Is it better to use the LO API for this purpose? It is native for not
formatted data.

Regards

Pavel

Show quoted text

David J.

david.g.johnston@gmail.com

about 2 years ago

In reply to: Pavel Stehule (#7)

Re: Emitting JSON to file using COPY TO

On Monday, November 27, 2023, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

po 27. 11. 2023 v 14:27 odesílatel David G. Johnston <
david.g.johnston@gmail.com> napsal:

On Monday, November 27, 2023, Dominique Devienne <ddevienne@gmail.com>
wrote:

There's even a JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and
avoid the escaping?

I agree there should be a copy option for “not formatted” so if you dump
a single column result in that format you get the raw unescaped contents of
the column. As soon as you ask for a format your json is now embedded so it
is a value within another format and any structural aspects of the wrapper
present in the json text representation need to be escaped.

Is it better to use the LO API for this purpose? It is native for not
formatted data.

Using LO is, IMO, never the answer. But if you are using a driver API
anyway just handle the normal select query result.

David J.

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: David G. Johnston (#6)

Re: Emitting JSON to file using COPY TO

"David G. Johnston" <david.g.johnston@gmail.com> writes:

I agree there should be a copy option for “not formatted” so if you dump a
single column result in that format you get the raw unescaped contents of
the column.

I'm not sure I even buy that. JSON data in particular is typically
multi-line, so how will you know where the row boundaries are?
That is, is a newline a row separator or part of the data?

You can debate the intelligence of any particular quoting/escaping
scheme, but imagining that you can get away without having one at
all will just create its own problems.

regards, tom lane

#10

ddevienne@gmail.com

about 2 years ago

In reply to: Tom Lane (#9)

Re: Emitting JSON to file using COPY TO

On Mon, Nov 27, 2023 at 3:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

I agree there should be a copy option for “not formatted” so if you dump

a

single column result in that format you get the raw unescaped contents of
the column.

I'm not sure I even buy that. JSON data in particular is typically
multi-line, so how will you know where the row boundaries are?
That is, is a newline a row separator or part of the data?

You can debate the intelligence of any particular quoting/escaping
scheme, but imagining that you can get away without having one at
all will just create its own problems.

What I was suggesting is not about a "not formatted" option.
But rather than JSON values (i.e. typed `json` or `jsonb`) in a
JSON-formatted COPY operator, the JSON values should not be
serialized to text that is simply output as a JSON-text-value by COPY,
but "inlined" as a "real" JSON value without the JSON document output by
COPY.

This is a special case, where the inner and outer "values" (for lack of a
better terminology)
are *both* JSON documents, and given that JSON is hierarchical, the inner
JSON value can
either by 1) serializing to text first, which must thus be escaped using
the JSON escaping rules,
2) NOT serialized, but "inline" or "spliced-in" the outer COPY JSON
document.

I guess COPY in JSON mode supports only #1 now? While #2 makes more sense
to me.
But both options are valid. Is that clearer?

BTW, JSON is not multi-line, except for insignificant whitespace.
So even COPY in JSON mode is not supposed to be line based I guess?
Unless COPY in JSON mode is more like NDJSON (https://ndjson.org/)? --DD

#11

Adrian Klaver

adrian.klaver@aklaver.com

about 2 years ago

In reply to: Dominique Devienne (#5)

Re: Emitting JSON to file using COPY TO

On 11/27/23 01:44, Dominique Devienne wrote:

On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <ddevienne@gmail.com
<mailto:ddevienne@gmail.com>> wrote:

On second thought, I guess that's COPY in its text modes doing the escaping?
Interesting. The text-based modes of COPY are configurable. There's even
a JSON mode.

Where are you seeing the JSON mode for COPY? AFAIK there is only text
and CSV formats.

By miracle, would the JSON output mode recognize JSON[B] values, and
avoid the escaping?

--
Adrian Klaver
adrian.klaver@aklaver.com

#12

ddevienne@gmail.com

about 2 years ago

In reply to: Adrian Klaver (#11)

Re: Emitting JSON to file using COPY TO

On Mon, Nov 27, 2023 at 5:04 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 11/27/23 01:44, Dominique Devienne wrote:

On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <ddevienne@gmail.com
<mailto:ddevienne@gmail.com>> wrote:
On second thought, I guess that's COPY in its text modes doing the

escaping?

Interesting. The text-based modes of COPY are configurable. There's even
a JSON mode.

Where are you seeing the JSON mode for COPY? AFAIK there is only text
and CSV formats.

Indeed. Somehow I thought there was...
I've used the TEXT and BINARY modes, and remembered a wishful thinking JSON
mode!
OK then, if there was, then what I wrote would apply :). --DD

#13

Filip Sedlák

filip@sedlakovi.org

about 2 years ago

In reply to: Dominique Devienne (#10)

Re: Emitting JSON to file using COPY TO

This would be a very special case for COPY. It applies only to a single
column of JSON values. The original problem can be solved with psql
--tuples-only as David wrote earlier.

$ psql -tc 'select json_agg(row_to_json(t))
from (select * from public.tbl_json_test) t;'

[{"id":1,"t_test":"here's a \"string\""}]

Special-casing any encoding/escaping scheme leads to bugs and harder
parsing.

Just my 2c.

--
Filip Sedlák

#14

davin@apache.org

about 2 years ago

In reply to: Filip Sedlák (#13)

Re: Emitting JSON to file using COPY TO

Thanks for the responses everyone.

I worked around the issue using the `psql -tc` method as Filip described.

I think it would be great to support writing JSON using COPY TO at
some point so I can emit JSON to files using a PostgreSQL function directly.

-Davin

On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <filip@sedlakovi.org> wrote:

Show quoted text

This would be a very special case for COPY. It applies only to a single
column of JSON values. The original problem can be solved with psql
--tuples-only as David wrote earlier.

$ psql -tc 'select json_agg(row_to_json(t))
from (select * from public.tbl_json_test) t;'

[{"id":1,"t_test":"here's a \"string\""}]

Special-casing any encoding/escaping scheme leads to bugs and harder
parsing.

Just my 2c.

--
Filip Sedlák

#15

mail@joeconway.com

about 2 years ago

In reply to: Davin Shearer (#14)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

On 11/29/23 10:32, Davin Shearer wrote:

Thanks for the responses everyone.

I worked around the issue using the `psql -tc` method as Filip described.

I think it would be great to support writing JSON using COPY TO at
some point so I can emit JSON to files using a PostgreSQL function directly.

-Davin

On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <filip@sedlakovi.org
<mailto:filip@sedlakovi.org>> wrote:

This would be a very special case for COPY. It applies only to a single
column of JSON values. The original problem can be solved with psql
--tuples-only as David wrote earlier.

$ psql -tc 'select json_agg(row_to_json(t))
from (select * from public.tbl_json_test) t;'

[{"id":1,"t_test":"here's a \"string\""}]

Special-casing any encoding/escaping scheme leads to bugs and harder
parsing.

(moved to hackers)

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.

Currently the patch lacks documentation and regression test support.

Questions:
----------
1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme need to be?

2. This only supports COPY TO and we would undoubtedly want to support
COPY FROM for JSON as well, but is that required from the start?

Thanks for any feedback.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

copyto_json.000.difftext/x-patch; charset=UTF-8; name=copyto_json.000.diffDownload

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..bc1f684 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 443,450 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 669,679 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..f6ee771 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,861 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/* if a JSON has been requested send the opening bracket */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			CopySendChar(cstate, '[');
+ 			CopySendEndOfRow(cstate);
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 903,915 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/* if a JSON has been requested send the closing bracket */
+ 	if (cstate->opts.json_mode)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 924,930 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 935,1015 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed)
+ 			CopySendChar(cstate, ',');
+ 		else
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopyAttributeOutText(cstate, result->data);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..e66bd01 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);

#16

nathandbossart@gmail.com

about 2 years ago

In reply to: Joe Conway (#15)

Re: Emitting JSON to file using COPY TO

On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.

Cool. I would expect there to be interest, given all the other JSON
support that has been added thus far.

I noticed that, with the PoC patch, "json" is the only format that must be
quoted. Without quotes, I see a syntax error. I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

1. Is supporting JSON array format sufficient, or does it need to support
some other options? How flexible does the support scheme need to be?

I don't presently have a strong opinion on this one. My instinct would be
start with something simple, though. I don't think we offer any special
options for log_destination...

2. This only supports COPY TO and we would undoubtedly want to support COPY
FROM for JSON as well, but is that required from the start?

I would vote for including COPY FROM support from the start.

! if (!cstate->opts.json_mode)

I think it's unfortunate that this further complicates the branching in
CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
worth refactoring a bunch of stuff to make this nicer.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#17

davin@apache.org

about 2 years ago

In reply to: Nathan Bossart (#16)

Re: Emitting JSON to file using COPY TO

I'm really glad to see this taken up as a possible new feature and will
definitely use it if it gets released. I'm impressed with how clean,
understandable, and approachable the postgres codebase is in general and
how easy it is to read and understand this patch.

I reviewed the patch (though I didn't build and test the code) and have a
concern with adding the '[' at the beginning and ']' at the end of the json
output. Those are already added by `json_agg` (
https://www.postgresql.org/docs/current/functions-aggregate.html) as you
can see in my initial email. Adding them in the COPY TO may be redundant
(e.g., [[{"key":"value"...}....]]).

I think COPY TO makes good sense to support, though COPY FROM maybe not so
much as JSON isn't necessarily flat and rectangular like CSV.

For my use-case, I'm emitting JSON files to Apache NiFi for processing, and
NiFi has superior handling of JSON (via JOLT parsers) versus CSV where
parsing is generally done with regex. I want to be able to emit JSON using
a postgres function and thus COPY TO.

Definitely +1 for COPY TO.

I don't think COPY FROM will work out well unless the JSON is required to
be flat and rectangular. I would vote -1 to leave it out due to the
necessary restrictions making it not generally useful.

Hope it helps,
Davin

On Fri, Dec 1, 2023 at 6:10 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

Show quoted text

On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.

Cool. I would expect there to be interest, given all the other JSON
support that has been added thus far.

I noticed that, with the PoC patch, "json" is the only format that must be
quoted. Without quotes, I see a syntax error. I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

1. Is supporting JSON array format sufficient, or does it need to support
some other options? How flexible does the support scheme need to be?

I don't presently have a strong opinion on this one. My instinct would be
start with something simple, though. I don't think we offer any special
options for log_destination...

2. This only supports COPY TO and we would undoubtedly want to support

COPY

FROM for JSON as well, but is that required from the start?

I would vote for including COPY FROM support from the start.

! if (!cstate->opts.json_mode)

I think it's unfortunate that this further complicates the branching in
CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
worth refactoring a bunch of stuff to make this nicer.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#18

mail@joeconway.com

about 2 years ago

In reply to: Davin Shearer (#17)

Re: Emitting JSON to file using COPY TO

On 12/1/23 22:00, Davin Shearer wrote:

I'm really glad to see this taken up as a possible new feature and will
definitely use it if it gets released. I'm impressed with how clean,
understandable, and approachable the postgres codebase is in general and
how easy it is to read and understand this patch.

I reviewed the patch (though I didn't build and test the code) and have
a concern with adding the '[' at the beginning and ']' at the end of the
json output. Those are already added by `json_agg`
(https://www.postgresql.org/docs/current/functions-aggregate.html
<https://www.postgresql.org/docs/current/functions-aggregate.html>) as
you can see in my initial email. Adding them in the COPY TO may be
redundant (e.g., [[{"key":"value"...}....]]).

With this patch in place you don't use json_agg() at all. See the
example output (this is real output with the patch applied):

(oops -- I meant to send this with the same email as the patch)

8<-------------------------------------------------
create table foo(id int8, f1 text, f2 timestamptz);
insert into foo
select g.i,
'line: ' || g.i::text,
clock_timestamp()
from generate_series(1,4) as g(i);

copy foo to stdout (format 'json');
[
{"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}
]
8<-------------------------------------------------

I think COPY TO makes good sense to support, though COPY FROM maybe not
so much as JSON isn't necessarily flat and rectangular like CSV.

Yeah -- definitely not as straight forward but possibly we just support
the array-of-jsonobj-rows as input as well?

For my use-case, I'm emitting JSON files to Apache NiFi for processing,
and NiFi has superior handling of JSON (via JOLT parsers) versus CSV
where parsing is generally done with regex. I want to be able to emit
JSON using a postgres function and thus COPY TO.

Definitely +1 for COPY TO.

I don't think COPY FROM will work out well unless the JSON is required
to be flat and rectangular. I would vote -1 to leave it out due to the
necessary restrictions making it not generally useful.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19

mail@joeconway.com

about 2 years ago

In reply to: Nathan Bossart (#16)

Re: Emitting JSON to file using COPY TO

On 12/1/23 18:09, Nathan Bossart wrote:

On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.

Cool. I would expect there to be interest, given all the other JSON
support that has been added thus far.

Thanks for the review

I noticed that, with the PoC patch, "json" is the only format that must be
quoted. Without quotes, I see a syntax error. I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

It seems to be because 'json' is also a type name ($$ =
SystemTypeName("json")).

What do you think about using 'json_array' instead? It is more specific
and accurate, and avoids the need to quote.

test=# copy foo to stdout (format json_array);
[
{"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}
]

1. Is supporting JSON array format sufficient, or does it need to support
some other options? How flexible does the support scheme need to be?

I don't presently have a strong opinion on this one. My instinct would be
start with something simple, though. I don't think we offer any special
options for log_destination...

WFM

2. This only supports COPY TO and we would undoubtedly want to support COPY
FROM for JSON as well, but is that required from the start?

I would vote for including COPY FROM support from the start.

Check. My thought is to only accept the same format we emit -- i.e. only
take a json array.

! if (!cstate->opts.json_mode)

I think it's unfortunate that this further complicates the branching in
CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
worth refactoring a bunch of stuff to make this nicer.

Yeah that was my conclusion.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Joe Conway (#19)

Re: Emitting JSON to file using COPY TO

Joe Conway <mail@joeconway.com> writes:

I noticed that, with the PoC patch, "json" is the only format that must be
quoted. Without quotes, I see a syntax error. I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

While I've not looked too closely, I suspect this might be due to the
FORMAT_LA hack in base_yylex:

/* Replace FORMAT by FORMAT_LA if it's followed by JSON */
switch (next_token)
{
case JSON:
cur_token = FORMAT_LA;
break;
}

So if you are writing a production that might need to match
FORMAT followed by JSON, you need to match FORMAT_LA too.

(I spent a little bit of time last week trying to get rid of
FORMAT_LA, thinking that it didn't look necessary. Did not
succeed yet.)

regards, tom lane

#21

Maciek Sakrejda

m.sakrejda@gmail.com

about 2 years ago

In reply to: Joe Conway (#15)

Re: Emitting JSON to file using COPY TO

On Fri, Dec 1, 2023 at 11:32 AM Joe Conway <mail@joeconway.com> wrote:

1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme need to be?

"JSON Lines" is a semi-standard format [1]https://jsonlines.org/ that's basically just
newline-separated JSON values. (In fact, this is what
log_destination=jsonlog gives you for Postgres logs, no?) It might be
worthwhile to support that, too.

[1]: https://jsonlines.org/

#22

nathandbossart@gmail.com

about 2 years ago

In reply to: Tom Lane (#20)

Re: Emitting JSON to file using COPY TO

On Sat, Dec 02, 2023 at 10:11:20AM -0500, Tom Lane wrote:

So if you are writing a production that might need to match
FORMAT followed by JSON, you need to match FORMAT_LA too.

Thanks for the pointer. That does seem to be the culprit.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..048494dd07 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3490,6 +3490,10 @@ copy_generic_opt_elem:
                 {
                     $$ = makeDefElem($1, $2, @1);
                 }
+            | FORMAT_LA copy_generic_opt_arg
+                {
+                    $$ = makeDefElem("format", $2, @1);
+                }
         ;

copy_generic_opt_arg:

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#23

mail@joeconway.com

about 2 years ago

In reply to: Nathan Bossart (#22)

Re: Emitting JSON to file using COPY TO

On 12/2/23 16:53, Nathan Bossart wrote:

On Sat, Dec 02, 2023 at 10:11:20AM -0500, Tom Lane wrote:

So if you are writing a production that might need to match
FORMAT followed by JSON, you need to match FORMAT_LA too.

Thanks for the pointer. That does seem to be the culprit.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..048494dd07 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3490,6 +3490,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+            | FORMAT_LA copy_generic_opt_arg
+                {
+                    $$ = makeDefElem("format", $2, @1);
+                }
;
copy_generic_opt_arg:

Yep -- I concluded the same. Thanks Tom!

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#24

mail@joeconway.com

about 2 years ago

In reply to: Maciek Sakrejda (#21)

Re: Emitting JSON to file using COPY TO

On 12/2/23 13:50, Maciek Sakrejda wrote:

On Fri, Dec 1, 2023 at 11:32 AM Joe Conway <mail@joeconway.com> wrote:

1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme need to be?

"JSON Lines" is a semi-standard format [1] that's basically just
newline-separated JSON values. (In fact, this is what
log_destination=jsonlog gives you for Postgres logs, no?) It might be
worthwhile to support that, too.

[1]: https://jsonlines.org/

Yes, I have seen examples of that associated with other databases (MSSQL
and Duckdb at least) as well. It probably makes sense to support that
format too.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#25

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#24)

Re: Emitting JSON to file using COPY TO

On 2023-12-02 Sa 17:43, Joe Conway wrote:

On 12/2/23 13:50, Maciek Sakrejda wrote:

On Fri, Dec 1, 2023 at 11:32 AM Joe Conway <mail@joeconway.com> wrote:

1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme
need to be?

"JSON Lines" is a semi-standard format [1] that's basically just
newline-separated JSON values. (In fact, this is what
log_destination=jsonlog gives you for Postgres logs, no?) It might be
worthwhile to support that, too.

[1]: https://jsonlines.org/

Yes, I have seen examples of that associated with other databases
(MSSQL and Duckdb at least) as well. It probably makes sense to
support that format too.

You can do that today, e.g.

copy (select to_json(q) from table_or_query q) to stdout

You can also do it as a single document as proposed here, like this:

copy (select json_agg(q) from table_or_query q) to stdout

The only downside to that is that it has to construct the aggregate,
which could be ugly for large datasets, and that's why I'm not opposed
to this patch.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#26

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#23)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

On 12/2/23 17:37, Joe Conway wrote:

On 12/2/23 16:53, Nathan Bossart wrote:
On Sat, Dec 02, 2023 at 10:11:20AM -0500, Tom Lane wrote:

So if you are writing a production that might need to match
FORMAT followed by JSON, you need to match FORMAT_LA too.

Thanks for the pointer. That does seem to be the culprit.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..048494dd07 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3490,6 +3490,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+            | FORMAT_LA copy_generic_opt_arg
+                {
+                    $$ = makeDefElem("format", $2, @1);
+                }
;
copy_generic_opt_arg:
Yep -- I concluded the same. Thanks Tom!

The attached implements the above repair, as well as adding support for
array decoration (or not) and/or comma row delimiters when not an array.

This covers the three variations of json import/export formats that I
have found after light searching (SQL Server and DuckDB).

Still lacks and documentation, tests, and COPY FROM support, but here is
what it looks like in a nutshell:

8<-----------------------------------------------
create table foo(id int8, f1 text, f2 timestamptz);
insert into foo
select g.i,
'line: ' || g.i::text,
clock_timestamp()
from generate_series(1,4) as g(i);

copy foo to stdout (format json);
{"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}

copy foo to stdout (format json, force_array);
[
{"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}
]

copy foo to stdout (format json, force_row_delimiter);
{"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}

copy foo to stdout (force_array);
ERROR: COPY FORCE_ARRAY requires JSON mode

copy foo to stdout (force_row_delimiter);
ERROR: COPY FORCE_ROW_DELIMITER requires JSON mode
8<-----------------------------------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

copyto_json.001.difftext/x-patch; charset=UTF-8; name=copyto_json.001.diffDownload

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..1f9ac31 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 443,450 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 542,559 ----
  								defel->defname),
  						 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_row_delimiter") == 0)
+ 		{
+ 			if (opts_out->force_row_delimiter)
+ 				errorConflictingDefElem(defel, pstate);
+ 			opts_out->force_row_delimiter = true;
+ 		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (opts_out->force_array)
+ 				errorConflictingDefElem(defel, pstate);
+ 			opts_out->force_array = true;
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 612,631 ----
  				(errcode(ERRCODE_SYNTAX_ERROR),
  				 errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (opts_out->force_array)
+ 			opts_out->force_row_delimiter = true;
+ 	}
+ 	else if (opts_out->force_array)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ 	else if (opts_out->force_row_delimiter)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ROW_DELIMITER requires JSON mode")));
+ 
  	/* Set defaults for omitted options */
  	if (!opts_out->delim)
  		opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 695,705 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..ca3680e 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/*
+ 		 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 		 * send the opening bracket.
+ 		 */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			if (cstate->opts.force_array)
+ 			{
+ 				CopySendChar(cstate, '[');
+ 				CopySendEndOfRow(cstate);
+ 			}
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/*
+ 	 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 	 * send the closing bracket.
+ 	 */
+ 	if (cstate->opts.json_mode &&
+ 		cstate->opts.force_array)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 945,1028 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed &&
+ 			cstate->opts.force_row_delimiter)
+ 		{
+ 			CopySendChar(cstate, ',');
+ 		}
+ 		else if (cstate->opts.force_row_delimiter)
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopyAttributeOutText(cstate, result->data);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..16aa131 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
  				{
  					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
  				}
+ 			| JSON
+ 				{
+ 					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ 				}
  			| HEADER_P
  				{
  					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3465 ----
  				{
  					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
  				}
+ 			| FORCE ROW DELIMITER
+ 				{
+ 					$$ = makeDefElem("force_row_delimiter", (Node *) makeBoolean(true), @1);
+ 				}
+ 			| FORCE ARRAY
+ 				{
+ 					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ 				}
  		;
  
  /* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3502,3511 ----
  				{
  					$$ = makeDefElem($1, $2, @1);
  				}
+ 			| FORMAT_LA copy_generic_opt_arg
+ 				{
+ 					$$ = makeDefElem("format", $2, @1);
+ 				}
  		;
  
  copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..266910d 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,69 ----
  	List	   *force_null;		/* list of column names */
  	bool		force_null_all; /* FORCE_NULL *? */
  	bool	   *force_null_flags;	/* per-column CSV FN flags */
+ 	bool		force_row_delimiter;	/* use comma as per-row JSON delimiter */
+ 	bool		force_array;	/* JSON array; implies force_row_delimiter */
  	bool		convert_selectively;	/* do selective binary conversion? */
  	List	   *convert_select; /* list of column names (can be NIL) */
  } CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);

#27

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#15)

Re: Emitting JSON to file using COPY TO

On 2023-12-01 Fr 14:28, Joe Conway wrote:

On 11/29/23 10:32, Davin Shearer wrote:

Thanks for the responses everyone.

I worked around the issue using the `psql -tc` method as Filip
described.

I think it would be great to support writing JSON using COPY TO at
some point so I can emit JSON to files using a PostgreSQL function
directly.

-Davin

On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <filip@sedlakovi.org
<mailto:filip@sedlakovi.org>> wrote:

    This would be a very special case for COPY. It applies only to a
single
    column of JSON values. The original problem can be solved with psql
    --tuples-only as David wrote earlier.

    $ psql -tc 'select json_agg(row_to_json(t))
     from (select * from public.tbl_json_test) t;'

     [{"id":1,"t_test":"here's a \"string\""}]

    Special-casing any encoding/escaping scheme leads to bugs and harder
    parsing.

(moved to hackers)

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.

Currently the patch lacks documentation and regression test support.

Questions:
----------
1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme need
to be?

2. This only supports COPY TO and we would undoubtedly want to support
COPY FROM for JSON as well, but is that required from the start?

Thanks for any feedback.

I realize this is just a POC, but I'd prefer to see composite_to_json()
not exposed. You could use the already public datum_to_json() instead,
passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
arguments.

I think JSON array format is sufficient.

I can see both sides of the COPY FROM argument, but I think insisting on
that makes this less doable for release 17. On balance I would stick to
COPY TO for now.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#28

davin@apache.org

about 2 years ago

In reply to: Andrew Dunstan (#27)

Re: Emitting JSON to file using COPY TO

Please be sure to include single and double quotes in the test values since
that was the original problem (double quoting in COPY TO breaking the JSON
syntax).

On Sun, Dec 3, 2023, 10:11 Andrew Dunstan <andrew@dunslane.net> wrote:

Show quoted text

On 2023-12-01 Fr 14:28, Joe Conway wrote:

On 11/29/23 10:32, Davin Shearer wrote:

Thanks for the responses everyone.

I worked around the issue using the `psql -tc` method as Filip
described.

I think it would be great to support writing JSON using COPY TO at
some point so I can emit JSON to files using a PostgreSQL function
directly.

-Davin

On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <filip@sedlakovi.org
<mailto:filip@sedlakovi.org>> wrote:

This would be a very special case for COPY. It applies only to a
single
column of JSON values. The original problem can be solved with psql
--tuples-only as David wrote earlier.

$ psql -tc 'select json_agg(row_to_json(t))
from (select * from public.tbl_json_test) t;'

[{"id":1,"t_test":"here's a \"string\""}]

Special-casing any encoding/escaping scheme leads to bugs and harder
parsing.

(moved to hackers)

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.

Currently the patch lacks documentation and regression test support.

Questions:
----------
1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme need
to be?

2. This only supports COPY TO and we would undoubtedly want to support
COPY FROM for JSON as well, but is that required from the start?

Thanks for any feedback.

I realize this is just a POC, but I'd prefer to see composite_to_json()
not exposed. You could use the already public datum_to_json() instead,
passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
arguments.

I think JSON array format is sufficient.

I can see both sides of the COPY FROM argument, but I think insisting on
that makes this less doable for release 17. On balance I would stick to
COPY TO for now.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#29

mail@joeconway.com

about 2 years ago

In reply to: Davin Shearer (#28)

Re: Emitting JSON to file using COPY TO

On 12/3/23 10:31, Davin Shearer wrote:

Please be sure to include single and double quotes in the test values
since that was the original problem (double quoting in COPY TO breaking
the JSON syntax).

test=# copy (select * from foo limit 4) to stdout (format json);
{"id":2456092,"f1":"line with ' in it:
2456092","f2":"2023-12-03T10:44:40.9712-05:00"}
{"id":2456093,"f1":"line with \\" in it:
2456093","f2":"2023-12-03T10:44:40.971221-05:00"}
{"id":2456094,"f1":"line with ' in it:
2456094","f2":"2023-12-03T10:44:40.971225-05:00"}
{"id":2456095,"f1":"line with \\" in it:
2456095","f2":"2023-12-03T10:44:40.971228-05:00"}

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#30

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#27)

Re: Emitting JSON to file using COPY TO

On 12/3/23 10:10, Andrew Dunstan wrote:

On 2023-12-01 Fr 14:28, Joe Conway wrote:

On 11/29/23 10:32, Davin Shearer wrote:

Thanks for the responses everyone.

I worked around the issue using the `psql -tc` method as Filip
described.

I think it would be great to support writing JSON using COPY TO at
some point so I can emit JSON to files using a PostgreSQL function
directly.

-Davin

On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <filip@sedlakovi.org
<mailto:filip@sedlakovi.org>> wrote:

    This would be a very special case for COPY. It applies only to a
single
    column of JSON values. The original problem can be solved with psql
    --tuples-only as David wrote earlier.

    $ psql -tc 'select json_agg(row_to_json(t))
     from (select * from public.tbl_json_test) t;'

     [{"id":1,"t_test":"here's a \"string\""}]

    Special-casing any encoding/escaping scheme leads to bugs and harder
    parsing.

(moved to hackers)

I did a quick PoC patch (attached) -- if there interest and no hard
objections I would like to get it up to speed for the January commitfest.

Currently the patch lacks documentation and regression test support.

Questions:
----------
1. Is supporting JSON array format sufficient, or does it need to
support some other options? How flexible does the support scheme need
to be?

2. This only supports COPY TO and we would undoubtedly want to support
COPY FROM for JSON as well, but is that required from the start?

Thanks for any feedback.

I realize this is just a POC, but I'd prefer to see composite_to_json()
not exposed. You could use the already public datum_to_json() instead,
passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
arguments.

Ok, thanks, will do

I think JSON array format is sufficient.

The other formats make sense from a completeness standpoint (versus
other databases) and the latest patch already includes them, so I still
lean toward supporting all three formats.

I can see both sides of the COPY FROM argument, but I think insisting on
that makes this less doable for release 17. On balance I would stick to
COPY TO for now.

WFM.

From your earlier post, regarding constructing the aggregate -- not
extensive testing but one data point:
8<--------------------------
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 36353.153 ms (00:36.353)
test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
COPY 1
Time: 46835.238 ms (00:46.835)
8<--------------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#31

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#30)

Re: Emitting JSON to file using COPY TO

On 12/3/23 11:03, Joe Conway wrote:

From your earlier post, regarding constructing the aggregate -- not
extensive testing but one data point:
8<--------------------------
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 36353.153 ms (00:36.353)
test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
COPY 1
Time: 46835.238 ms (00:46.835)
8<--------------------------

Also if the table is large enough, the aggregate method is not even
feasible whereas the COPY TO method works:
8<--------------------------
test=# select count(*) from foo;
count
----------
20000000
(1 row)

test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
ERROR: out of memory
DETAIL: Cannot enlarge string buffer containing 1073741822 bytes by 1
more bytes.

test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 20000000
8<--------------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#32

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#30)

Re: Emitting JSON to file using COPY TO

On 12/3/23 11:03, Joe Conway wrote:

On 12/3/23 10:10, Andrew Dunstan wrote:

I realize this is just a POC, but I'd prefer to see composite_to_json()
not exposed. You could use the already public datum_to_json() instead,
passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
arguments.

Ok, thanks, will do

Just FYI, this change does loose some performance in my not massively
scientific A/B/A test:

8<---------------------------
-- with datum_to_json()
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 37196.898 ms (00:37.197)
Time: 37408.161 ms (00:37.408)
Time: 38393.309 ms (00:38.393)
Time: 36855.438 ms (00:36.855)
Time: 37806.280 ms (00:37.806)

Avg = 37532

-- original patch
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 37426.207 ms (00:37.426)
Time: 36068.187 ms (00:36.068)
Time: 38285.252 ms (00:38.285)
Time: 36971.042 ms (00:36.971)
Time: 35690.822 ms (00:35.691)

Avg = 36888

-- with datum_to_json()
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 39083.467 ms (00:39.083)
Time: 37249.326 ms (00:37.249)
Time: 38529.721 ms (00:38.530)
Time: 38704.920 ms (00:38.705)
Time: 39001.326 ms (00:39.001)

Avg = 38513
8<---------------------------

That is somewhere in the 3% range.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#33

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#31)

Re: Emitting JSON to file using COPY TO

On 2023-12-03 Su 12:11, Joe Conway wrote:

On 12/3/23 11:03, Joe Conway wrote:

From your earlier post, regarding constructing the aggregate -- not
extensive testing but one data point:
8<--------------------------
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 36353.153 ms (00:36.353)
test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
COPY 1
Time: 46835.238 ms (00:46.835)
8<--------------------------

Also if the table is large enough, the aggregate method is not even
feasible whereas the COPY TO method works:
8<--------------------------
test=# select count(*) from foo;
count
----------
20000000
(1 row)

test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
ERROR: out of memory
DETAIL: Cannot enlarge string buffer containing 1073741822 bytes by 1
more bytes.

test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 20000000
8<--------------------------

None of this is surprising. As I mentioned, limitations with json_agg()
are why I support the idea of this patch.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#34

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#32)

Re: Emitting JSON to file using COPY TO

On 2023-12-03 Su 14:24, Joe Conway wrote:

On 12/3/23 11:03, Joe Conway wrote:

On 12/3/23 10:10, Andrew Dunstan wrote:

I realize this is just a POC, but I'd prefer to see
composite_to_json()
not exposed. You could use the already public datum_to_json() instead,
passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
arguments.

Ok, thanks, will do

Just FYI, this change does loose some performance in my not massively
scientific A/B/A test:

8<---------------------------
-- with datum_to_json()
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 37196.898 ms (00:37.197)
Time: 37408.161 ms (00:37.408)
Time: 38393.309 ms (00:38.393)
Time: 36855.438 ms (00:36.855)
Time: 37806.280 ms (00:37.806)

Avg = 37532

-- original patch
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 37426.207 ms (00:37.426)
Time: 36068.187 ms (00:36.068)
Time: 38285.252 ms (00:38.285)
Time: 36971.042 ms (00:36.971)
Time: 35690.822 ms (00:35.691)

Avg = 36888

-- with datum_to_json()
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 39083.467 ms (00:39.083)
Time: 37249.326 ms (00:37.249)
Time: 38529.721 ms (00:38.530)
Time: 38704.920 ms (00:38.705)
Time: 39001.326 ms (00:39.001)

Avg = 38513
8<---------------------------

That is somewhere in the 3% range.

I assume it's because datum_to_json() constructs a text value from which
you then need to extract the cstring, whereas composite_to_json(), just
gives you back the stringinfo. I guess that's a good enough reason to go
with exposing composite_to_json().

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#35

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#34)

Re: Emitting JSON to file using COPY TO

On 12/3/23 14:52, Andrew Dunstan wrote:

On 2023-12-03 Su 14:24, Joe Conway wrote:

On 12/3/23 11:03, Joe Conway wrote:

On 12/3/23 10:10, Andrew Dunstan wrote:

I realize this is just a POC, but I'd prefer to see
composite_to_json()
not exposed. You could use the already public datum_to_json() instead,
passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
arguments.

Ok, thanks, will do

Just FYI, this change does loose some performance in my not massively
scientific A/B/A test:

8<---------------------------

<snip>

8<---------------------------

That is somewhere in the 3% range.

I assume it's because datum_to_json() constructs a text value from which
you then need to extract the cstring, whereas composite_to_json(), just
gives you back the stringinfo. I guess that's a good enough reason to go
with exposing composite_to_json().

Yeah, that was why I went that route in the first place. If you are good
with it I will go back to that. The code is a bit simpler too.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#36

davin@apache.org

about 2 years ago

In reply to: Joe Conway (#29)

Re: Emitting JSON to file using COPY TO

" being quoted as \\" breaks the JSON. It needs to be \". This has been my
whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct quoting
for special characters. I use `jq` on the command line to validate and
format the output.

On Sun, Dec 3, 2023, 10:51 Joe Conway <mail@joeconway.com> wrote:

Show quoted text

On 12/3/23 10:31, Davin Shearer wrote:

Please be sure to include single and double quotes in the test values
since that was the original problem (double quoting in COPY TO breaking
the JSON syntax).

test=# copy (select * from foo limit 4) to stdout (format json);
{"id":2456092,"f1":"line with ' in it:
2456092","f2":"2023-12-03T10:44:40.9712-05:00"}
{"id":2456093,"f1":"line with \\" in it:
2456093","f2":"2023-12-03T10:44:40.971221-05:00"}
{"id":2456094,"f1":"line with ' in it:
2456094","f2":"2023-12-03T10:44:40.971225-05:00"}
{"id":2456095,"f1":"line with \\" in it:
2456095","f2":"2023-12-03T10:44:40.971228-05:00"}

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#37

mail@joeconway.com

about 2 years ago

In reply to: Davin Shearer (#36)

Re: Emitting JSON to file using COPY TO

(please don't top quote on the Postgres lists)

On 12/3/23 17:38, Davin Shearer wrote:

" being quoted as \\" breaks the JSON. It needs to be \". This has been
my whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct quoting
for special characters. I use `jq` on the command line to validate and
format the output.

I just hooked existing "row-to-json machinery" up to the "COPY TO"
statement. If the output is wrong (just for for this use case?), that
would be a missing feature (or possibly a bug?).

Davin -- how did you work around the issue with the way the built in
functions output JSON?

Andrew -- comments/thoughts?

Joe

On Sun, Dec 3, 2023, 10:51 Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> wrote:

On 12/3/23 10:31, Davin Shearer wrote:

Please be sure to include single and double quotes in the test

values

since that was the original problem (double quoting in COPY TO

breaking

the JSON syntax).

test=# copy (select * from foo limit 4) to stdout (format json);
{"id":2456092,"f1":"line with ' in it:
2456092","f2":"2023-12-03T10:44:40.9712-05:00"}
{"id":2456093,"f1":"line with \\" in it:
2456093","f2":"2023-12-03T10:44:40.971221-05:00"}
{"id":2456094,"f1":"line with ' in it:
2456094","f2":"2023-12-03T10:44:40.971225-05:00"}
{"id":2456095,"f1":"line with \\" in it:
2456095","f2":"2023-12-03T10:44:40.971228-05:00"}

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com>

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#38

davin@apache.org

about 2 years ago

In reply to: Joe Conway (#37)

Re: Emitting JSON to file using COPY TO

I worked around it by using select json_agg(t)... and redirecting it to
file via psql on the command line. COPY TO was working until we ran into
broken JSON and discovered the double quoting issue due to some values
containing " in them.

#39

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#37)

Re: Emitting JSON to file using COPY TO

On 2023-12-03 Su 20:14, Joe Conway wrote:

(please don't top quote on the Postgres lists)

On 12/3/23 17:38, Davin Shearer wrote:

" being quoted as \\" breaks the JSON. It needs to be \". This has
been my whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct
quoting for special characters. I use `jq` on the command line to
validate and format the output.

I just hooked existing "row-to-json machinery" up to the "COPY TO"
statement. If the output is wrong (just for for this use case?), that
would be a missing feature (or possibly a bug?).

Davin -- how did you work around the issue with the way the built in
functions output JSON?

Andrew -- comments/thoughts?

I meant to mention this when I was making comments yesterday.

The patch should not be using CopyAttributeOutText - it will try to
escape characters such as \, which produces the effect complained of
here, or else we need to change its setup so we have a way to inhibit
that escaping.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#40

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#39)

Re: Emitting JSON to file using COPY TO

On 12/4/23 07:41, Andrew Dunstan wrote:

On 2023-12-03 Su 20:14, Joe Conway wrote:

(please don't top quote on the Postgres lists)

On 12/3/23 17:38, Davin Shearer wrote:

" being quoted as \\" breaks the JSON. It needs to be \". This has
been my whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct
quoting for special characters. I use `jq` on the command line to
validate and format the output.

I just hooked existing "row-to-json machinery" up to the "COPY TO"
statement. If the output is wrong (just for for this use case?), that
would be a missing feature (or possibly a bug?).

Davin -- how did you work around the issue with the way the built in
functions output JSON?

Andrew -- comments/thoughts?

I meant to mention this when I was making comments yesterday.

The patch should not be using CopyAttributeOutText - it will try to
escape characters such as \, which produces the effect complained of
here, or else we need to change its setup so we have a way to inhibit
that escaping.

Interesting.

I am surprised this has never been raised as a problem with COPY TO before.

Should the JSON output, as produced by composite_to_json(), be sent
as-is with no escaping at all? If yes, is JSON somehow unique in this
regard?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#41

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#40)

Re: Emitting JSON to file using COPY TO

On 2023-12-04 Mo 08:37, Joe Conway wrote:

On 12/4/23 07:41, Andrew Dunstan wrote:

On 2023-12-03 Su 20:14, Joe Conway wrote:

(please don't top quote on the Postgres lists)

On 12/3/23 17:38, Davin Shearer wrote:

" being quoted as \\" breaks the JSON. It needs to be \". This has
been my whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct
quoting for special characters. I use `jq` on the command line to
validate and format the output.

I just hooked existing "row-to-json machinery" up to the "COPY TO"
statement. If the output is wrong (just for for this use case?),
that would be a missing feature (or possibly a bug?).

Davin -- how did you work around the issue with the way the built in
functions output JSON?

Andrew -- comments/thoughts?

I meant to mention this when I was making comments yesterday.

The patch should not be using CopyAttributeOutText - it will try to
escape characters such as \, which produces the effect complained of
here, or else we need to change its setup so we have a way to inhibit
that escaping.

Interesting.

I am surprised this has never been raised as a problem with COPY TO
before.

Should the JSON output, as produced by composite_to_json(), be sent
as-is with no escaping at all? If yes, is JSON somehow unique in this
regard?

Text mode output is in such a form that it can be read back in using
text mode input. There's nothing special about JSON in this respect -
any text field will be escaped too. But output suitable for text mode
input is not what you're trying to produce here; you're trying to
produce valid JSON.

So, yes, the result of composite_to_json, which is already suitably
escaped, should not be further escaped in this case.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#42

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#41)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

On 12/4/23 09:25, Andrew Dunstan wrote:

On 2023-12-04 Mo 08:37, Joe Conway wrote:

On 12/4/23 07:41, Andrew Dunstan wrote:

On 2023-12-03 Su 20:14, Joe Conway wrote:

(please don't top quote on the Postgres lists)

On 12/3/23 17:38, Davin Shearer wrote:

" being quoted as \\" breaks the JSON. It needs to be \". This has
been my whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct
quoting for special characters. I use `jq` on the command line to
validate and format the output.

I just hooked existing "row-to-json machinery" up to the "COPY TO"
statement. If the output is wrong (just for for this use case?),
that would be a missing feature (or possibly a bug?).

Davin -- how did you work around the issue with the way the built in
functions output JSON?

Andrew -- comments/thoughts?

I meant to mention this when I was making comments yesterday.

The patch should not be using CopyAttributeOutText - it will try to
escape characters such as \, which produces the effect complained of
here, or else we need to change its setup so we have a way to inhibit
that escaping.

Interesting.

I am surprised this has never been raised as a problem with COPY TO
before.

Should the JSON output, as produced by composite_to_json(), be sent
as-is with no escaping at all? If yes, is JSON somehow unique in this
regard?

Text mode output is in such a form that it can be read back in using
text mode input. There's nothing special about JSON in this respect -
any text field will be escaped too. But output suitable for text mode
input is not what you're trying to produce here; you're trying to
produce valid JSON.

So, yes, the result of composite_to_json, which is already suitably
escaped, should not be further escaped in this case.

Gotcha.

This patch version uses CopySendData() instead and includes
documentation changes. Still lacks regression tests.

Hopefully this looks better. Any other particular strings I ought to
test with?

8<------------------
test=# copy (select * from foo limit 4) to stdout (format json,
force_array true);
[
{"id":1,"f1":"line with \" in it:
1","f2":"2023-12-03T12:26:41.596053-05:00"}
,{"id":2,"f1":"line with ' in it:
2","f2":"2023-12-03T12:26:41.596173-05:00"}
,{"id":3,"f1":"line with \" in it:
3","f2":"2023-12-03T12:26:41.596179-05:00"}
,{"id":4,"f1":"line with ' in it:
4","f2":"2023-12-03T12:26:41.596182-05:00"}
]
8<------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

copyto_json.003.difftext/x-patch; charset=UTF-8; name=copyto_json.003.diffDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..af8777b 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,50 ----
      FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+     FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
+     FORCE_ROW_DELIMITER [ <replaceable class="parameter">boolean</replaceable> ]
      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
  </synopsis>
   </refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 208,221 ----
        Selects the data format to be read or written:
        <literal>text</literal>,
        <literal>csv</literal> (Comma Separated Values),
+       <literal>json</literal> (JavaScript Object Notation),
        or <literal>binary</literal>.
        The default is <literal>text</literal>.
       </para>
+      <para>
+       The <literal>json</literal> option is allowed only in
+       <command>COPY TO</command>.
+      </para>
      </listitem>
     </varlistentry>
  
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 379,410 ----
       </para>
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ROW_DELIMITER</literal></term>
+     <listitem>
+      <para>
+       Force output of commas as row delimiters, in addition to the usual
+       end of line characters. This option is allowed only in
+       <command>COPY TO</command>, and only when using
+       <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of array decorations at the beginning and end of output.
+       This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+       option. It is allowed only in <command>COPY TO</command>, and only
+       when using <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
  
     <varlistentry>
      <term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..46ec34f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 443,450 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 542,559 ----
  								defel->defname),
  						 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_row_delimiter") == 0)
+ 		{
+ 			if (opts_out->force_row_delimiter)
+ 				errorConflictingDefElem(defel, pstate);
+ 			opts_out->force_row_delimiter = true;
+ 		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (opts_out->force_array)
+ 				errorConflictingDefElem(defel, pstate);
+ 			opts_out->force_array = true;
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 612,636 ----
  				(errcode(ERRCODE_SYNTAX_ERROR),
  				 errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use JSON mode in COPY FROM")));
+ 
+ 		if (opts_out->force_array)
+ 			opts_out->force_row_delimiter = true;
+ 	}
+ 	else if (opts_out->force_array)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ 	else if (opts_out->force_row_delimiter)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ROW_DELIMITER requires JSON mode")));
+ 
  	/* Set defaults for omitted options */
  	if (!opts_out->delim)
  		opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 700,710 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..fba3070 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/*
+ 		 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 		 * send the opening bracket.
+ 		 */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			if (cstate->opts.force_array)
+ 			{
+ 				CopySendChar(cstate, '[');
+ 				CopySendEndOfRow(cstate);
+ 			}
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/*
+ 	 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 	 * send the closing bracket.
+ 	 */
+ 	if (cstate->opts.json_mode &&
+ 		cstate->opts.force_array)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 945,1028 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed &&
+ 			cstate->opts.force_row_delimiter)
+ 		{
+ 			CopySendChar(cstate, ',');
+ 		}
+ 		else if (cstate->opts.force_row_delimiter)
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopySendData(cstate, result->data, result->len);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..16aa131 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
  				{
  					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
  				}
+ 			| JSON
+ 				{
+ 					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ 				}
  			| HEADER_P
  				{
  					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3465 ----
  				{
  					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
  				}
+ 			| FORCE ROW DELIMITER
+ 				{
+ 					$$ = makeDefElem("force_row_delimiter", (Node *) makeBoolean(true), @1);
+ 				}
+ 			| FORCE ARRAY
+ 				{
+ 					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ 				}
  		;
  
  /* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3502,3511 ----
  				{
  					$$ = makeDefElem($1, $2, @1);
  				}
+ 			| FORMAT_LA copy_generic_opt_arg
+ 				{
+ 					$$ = makeDefElem("format", $2, @1);
+ 				}
  		;
  
  copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..266910d 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,69 ----
  	List	   *force_null;		/* list of column names */
  	bool		force_null_all; /* FORCE_NULL *? */
  	bool	   *force_null_flags;	/* per-column CSV FN flags */
+ 	bool		force_row_delimiter;	/* use comma as per-row JSON delimiter */
+ 	bool		force_array;	/* JSON array; implies force_row_delimiter */
  	bool		convert_selectively;	/* do selective binary conversion? */
  	List	   *convert_select; /* list of column names (can be NIL) */
  } CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);

#43

davin@apache.org

about 2 years ago

In reply to: Joe Conway (#42)

Re: Emitting JSON to file using COPY TO

Looking great!

For testing, in addition to the quotes, include DOS and Unix EOL, \ and /,
Byte Order Markers, and mulitbyte characters like UTF-8.

Essentially anything considered textural is fair game to be a value.

On Mon, Dec 4, 2023, 10:46 Joe Conway <mail@joeconway.com> wrote:

Show quoted text

On 12/4/23 09:25, Andrew Dunstan wrote:

On 2023-12-04 Mo 08:37, Joe Conway wrote:

On 12/4/23 07:41, Andrew Dunstan wrote:

On 2023-12-03 Su 20:14, Joe Conway wrote:

(please don't top quote on the Postgres lists)

On 12/3/23 17:38, Davin Shearer wrote:

" being quoted as \\" breaks the JSON. It needs to be \". This has
been my whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct
quoting for special characters. I use `jq` on the command line to
validate and format the output.

I just hooked existing "row-to-json machinery" up to the "COPY TO"
statement. If the output is wrong (just for for this use case?),
that would be a missing feature (or possibly a bug?).

Davin -- how did you work around the issue with the way the built in
functions output JSON?

Andrew -- comments/thoughts?

I meant to mention this when I was making comments yesterday.

The patch should not be using CopyAttributeOutText - it will try to
escape characters such as \, which produces the effect complained of
here, or else we need to change its setup so we have a way to inhibit
that escaping.

Interesting.

I am surprised this has never been raised as a problem with COPY TO
before.

Should the JSON output, as produced by composite_to_json(), be sent
as-is with no escaping at all? If yes, is JSON somehow unique in this
regard?

Text mode output is in such a form that it can be read back in using
text mode input. There's nothing special about JSON in this respect -
any text field will be escaped too. But output suitable for text mode
input is not what you're trying to produce here; you're trying to
produce valid JSON.

So, yes, the result of composite_to_json, which is already suitably
escaped, should not be further escaped in this case.

Gotcha.

This patch version uses CopySendData() instead and includes
documentation changes. Still lacks regression tests.

Hopefully this looks better. Any other particular strings I ought to
test with?

8<------------------
test=# copy (select * from foo limit 4) to stdout (format json,
force_array true);
[
{"id":1,"f1":"line with \" in it:
1","f2":"2023-12-03T12:26:41.596053-05:00"}
,{"id":2,"f1":"line with ' in it:
2","f2":"2023-12-03T12:26:41.596173-05:00"}
,{"id":3,"f1":"line with \" in it:
3","f2":"2023-12-03T12:26:41.596179-05:00"}
,{"id":4,"f1":"line with ' in it:
4","f2":"2023-12-03T12:26:41.596182-05:00"}
]
8<------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#44

andrew@dunslane.net

about 2 years ago

In reply to: Davin Shearer (#43)

Re: Emitting JSON to file using COPY TO

On 2023-12-04 Mo 13:37, Davin Shearer wrote:

Looking great!

For testing, in addition to the quotes, include DOS and Unix EOL, \
and /, Byte Order Markers, and mulitbyte characters like UTF-8.

Essentially anything considered textural is fair game to be a value.

Joe already asked you to avoid top-posting on PostgreSQL lists. See
<http://idallen.com/topposting.html>
<http://idallen.com/topposting.html>> for an explanation.

We don't process BOMs elsewhere, and probably should not here either.
They are in fact neither required nor recommended for use with UTF8
data, AIUI. See a recent discussion on this list on that topic:
</messages/by-id/81ca2b25-6b3a-499a-9a09-2dd21253c2cb@unitrunker.net>

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#45

davin@apache.org

about 2 years ago

In reply to: Andrew Dunstan (#44)

Re: Emitting JSON to file using COPY TO

Sorry about the top posting / top quoting... the link you sent me gives me
a 404. I'm not exactly sure what top quoting / posting means and Googling
those terms wasn't helpful for me, but I've removed the quoting that my
mail client is automatically "helpfully" adding to my emails. I mean no
offense.

Okay, digging in more...

If the value contains text that has BOMs [footnote 1] in it, it must be
preserved (the database doesn't need to interpret them or do anything
special with them - just store it and fetch it). There are however a few
characters that need to be escaped (per
https://www.w3docs.com/snippets/java/how-should-i-escape-strings-in-json.html)
so that the JSON format isn't broken. They are:

1. " (double quote)
2. \ (backslash)
3. / (forward slash)
4. \b (backspace)
5. \f (form feed)
6. \n (new line)
7. \r (carriage return)
8. \t (horizontal tab)

These characters should be represented in the test cases to see how the
escaping behaves and to ensure that the escaping is done properly per JSON
requirements. Forward slash comes as a bit of a surprise to me, but `jq`
handles it either way:

➜ echo '{"key": "this / is a forward slash"}' | jq .
{
"key": "this / is a forward slash"
}
➜ echo '{"key": "this \/ is a forward slash"}' | jq .
{
"key": "this / is a forward slash"
}

Hope it helps, and thank you!

1. I don't disagree that BOMs shouldn't be used for UTF-8, but I'm also
processing UTF-16{BE,LE} and UTF-32{BE,LE} (as well as other textural
formats that are neither ASCII or Unicode). I don't have the luxury of
changing the data that is given.

#46

https://en.wikipedia.org/wiki/Posting_style

mail@joeconway.com

about 2 years ago

In reply to: Davin Shearer (#45)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

On 12/4/23 17:55, Davin Shearer wrote:

Sorry about the top posting / top quoting... the link you sent me gives
me a 404. I'm not exactly sure what top quoting / posting means and
Googling those terms wasn't helpful for me, but I've removed the quoting
that my mail client is automatically "helpfully" adding to my emails. I
mean no offense.

No offense taken. But it is worthwhile to conform to the very long
established norms of the mailing lists on which you participate. See:

I would describe the Postgres list style (based on that link) as

"inline replying, in which the different parts of the reply follow
the relevant parts of the original post...[with]...trimming of the
original text"

There are however a few characters that need to be escaped

1. |"|(double quote)
2. |\|(backslash)
3. |/|(forward slash)
4. |\b|(backspace)
5. |\f|(form feed)
6. |\n|(new line)
7. |\r|(carriage return)
8. |\t|(horizontal tab)

These characters should be represented in the test cases to see how the
escaping behaves and to ensure that the escaping is done properly per
JSON requirements.

I can look at adding these as test cases. The latest version of the
patch (attached) includes some of that already. For reference, the tests
so far include this:

8<-------------------------------
test=# select * from copytest;
style | test | filler
---------+----------+--------
DOS | abc\r +| 1
| def |
Unix | abc +| 2
| def |
Mac | abc\rdef | 3
esc\ape | a\r\\r\ +| 4
| \nb |
(4 rows)

test=# copy copytest to stdout (format json);
{"style":"DOS","test":"abc\r\ndef","filler":1}
{"style":"Unix","test":"abc\ndef","filler":2}
{"style":"Mac","test":"abc\rdef","filler":3}
{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
8<-------------------------------

At this point "COPY TO" should be sending exactly the unaltered output
of the postgres JSON processing functions.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

copyto_json.004.difftext/x-patch; charset=UTF-8; name=copyto_json.004.diffDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..af8777b 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,50 ----
      FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+     FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
+     FORCE_ROW_DELIMITER [ <replaceable class="parameter">boolean</replaceable> ]
      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
  </synopsis>
   </refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 208,221 ----
        Selects the data format to be read or written:
        <literal>text</literal>,
        <literal>csv</literal> (Comma Separated Values),
+       <literal>json</literal> (JavaScript Object Notation),
        or <literal>binary</literal>.
        The default is <literal>text</literal>.
       </para>
+      <para>
+       The <literal>json</literal> option is allowed only in
+       <command>COPY TO</command>.
+      </para>
      </listitem>
     </varlistentry>
  
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 379,410 ----
       </para>
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ROW_DELIMITER</literal></term>
+     <listitem>
+      <para>
+       Force output of commas as row delimiters, in addition to the usual
+       end of line characters. This option is allowed only in
+       <command>COPY TO</command>, and only when using
+       <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of array decorations at the beginning and end of output.
+       This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+       option. It is allowed only in <command>COPY TO</command>, and only
+       when using <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
  
     <varlistentry>
      <term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..0236a9e 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,426 ----
  	bool		format_specified = false;
  	bool		freeze_specified = false;
  	bool		header_specified = false;
+ 	bool		force_row_delimiter_specified = false;
+ 	bool		force_array_specified = false;
  	ListCell   *option;
  
  	/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 445,452 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 544,563 ----
  								defel->defname),
  						 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_row_delimiter") == 0)
+ 		{
+ 			if (force_row_delimiter_specified)
+ 				errorConflictingDefElem(defel, pstate);
+ 			force_row_delimiter_specified = true;
+ 			opts_out->force_row_delimiter = defGetBoolean(defel);
+ 		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (force_array_specified)
+ 				errorConflictingDefElem(defel, pstate);
+ 			force_array_specified = true;
+ 			opts_out->force_array = defGetBoolean(defel);
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 616,647 ----
  				(errcode(ERRCODE_SYNTAX_ERROR),
  				 errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use JSON mode in COPY FROM")));
+ 
+ 		if (opts_out->force_array &&
+ 			force_row_delimiter_specified &&
+ 			!opts_out->force_row_delimiter)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true")));
+ 
+ 		if (opts_out->force_array)
+ 			opts_out->force_row_delimiter = true;
+ 	}
+ 	else if (opts_out->force_array)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ 	else if (opts_out->force_row_delimiter)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ROW_DELIMITER requires JSON mode")));
+ 
  	/* Set defaults for omitted options */
  	if (!opts_out->delim)
  		opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 711,721 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..fba3070 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/*
+ 		 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 		 * send the opening bracket.
+ 		 */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			if (cstate->opts.force_array)
+ 			{
+ 				CopySendChar(cstate, '[');
+ 				CopySendEndOfRow(cstate);
+ 			}
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/*
+ 	 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 	 * send the closing bracket.
+ 	 */
+ 	if (cstate->opts.json_mode &&
+ 		cstate->opts.force_array)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 945,1028 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed &&
+ 			cstate->opts.force_row_delimiter)
+ 		{
+ 			CopySendChar(cstate, ',');
+ 		}
+ 		else if (cstate->opts.force_row_delimiter)
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopySendData(cstate, result->data, result->len);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..16aa131 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
  				{
  					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
  				}
+ 			| JSON
+ 				{
+ 					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ 				}
  			| HEADER_P
  				{
  					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3465 ----
  				{
  					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
  				}
+ 			| FORCE ROW DELIMITER
+ 				{
+ 					$$ = makeDefElem("force_row_delimiter", (Node *) makeBoolean(true), @1);
+ 				}
+ 			| FORCE ARRAY
+ 				{
+ 					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ 				}
  		;
  
  /* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3502,3511 ----
  				{
  					$$ = makeDefElem($1, $2, @1);
  				}
+ 			| FORMAT_LA copy_generic_opt_arg
+ 				{
+ 					$$ = makeDefElem("format", $2, @1);
+ 				}
  		;
  
  copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..266910d 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,69 ----
  	List	   *force_null;		/* list of column names */
  	bool		force_null_all; /* FORCE_NULL *? */
  	bool	   *force_null_flags;	/* per-column CSV FN flags */
+ 	bool		force_row_delimiter;	/* use comma as per-row JSON delimiter */
+ 	bool		force_array;	/* JSON array; implies force_row_delimiter */
  	bool		convert_selectively;	/* do selective binary conversion? */
  	List	   *convert_select; /* list of column names (can be NIL) */
  } CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..a34cb39 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,98 ----
  c1,"col with , comma","col with "" quote"
  1,a,1
  2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_row_delimiter);
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout
+  (format json, force_array true, force_row_delimiter false);
+ ERROR:  cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true
+ -- Error
+ copy copytest to stdout
+  (format json, header);
+ ERROR:  cannot specify HEADER in JSON mode
+ -- embedded quotes
+ create temp table copyjsontest (
+     id int8,
+     f1 text,
+     f2 timestamptz);
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..189c703 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,94 ----
  
  copy copytest3 to stdout csv header;
  
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ 
+ copy copytest to stdout (format json);
+ 
+ copy copytest to stdout (format json, force_array);
+ 
+ copy copytest to stdout (format json, force_row_delimiter);
+ 
+ -- Error
+ copy copytest to stdout
+  (format json, force_array true, force_row_delimiter false);
+ 
+ -- Error
+ copy copytest to stdout
+  (format json, header);
+ 
+ -- embedded quotes
+ create temp table copyjsontest (
+     id int8,
+     f1 text,
+     f2 timestamptz);
+ 
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ 
+ copy copyjsontest to stdout json;
+ 
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);

#47

andrew@dunslane.net

about 2 years ago

In reply to: Davin Shearer (#45)

Re: Emitting JSON to file using COPY TO

On 2023-12-04 Mo 17:55, Davin Shearer wrote:

Sorry about the top posting / top quoting... the link you sent me
gives me a 404. I'm not exactly sure what top quoting / posting means
and Googling those terms wasn't helpful for me, but I've removed the
quoting that my mail client is automatically "helpfully" adding to my
emails. I mean no offense.

Hmm. Luckily the Wayback Machine has a copy:
<http://web.archive.org/web/20230608210806/idallen.com/topposting.html>

Maybe I'll put a copy in the developer wiki.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#48

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#46)

Re: Emitting JSON to file using COPY TO

On 12/4/23 21:54, Joe Conway wrote:

On 12/4/23 17:55, Davin Shearer wrote:

There are however a few characters that need to be escaped

1. |"|(double quote)
2. |\|(backslash)
3. |/|(forward slash)
4. |\b|(backspace)
5. |\f|(form feed)
6. |\n|(new line)
7. |\r|(carriage return)
8. |\t|(horizontal tab)

These characters should be represented in the test cases to see how the
escaping behaves and to ensure that the escaping is done properly per
JSON requirements.

I can look at adding these as test cases.

So I did a quick check:
8<--------------------------
with t(f1) as
(
values
(E'aaa\"bbb'::text),
(E'aaa\\bbb'::text),
(E'aaa\/bbb'::text),
(E'aaa\bbbb'::text),
(E'aaa\fbbb'::text),
(E'aaa\nbbb'::text),
(E'aaa\rbbb'::text),
(E'aaa\tbbb'::text)
)
select
length(t.f1),
t.f1,
row_to_json(t)
from t;
length | f1 | row_to_json
--------+-------------+-------------------
7 | aaa"bbb | {"f1":"aaa\"bbb"}
7 | aaa\bbb | {"f1":"aaa\\bbb"}
7 | aaa/bbb | {"f1":"aaa/bbb"}
7 | aaa\x08bbb | {"f1":"aaa\bbbb"}
7 | aaa\x0Cbbb | {"f1":"aaa\fbbb"}
7 | aaa +| {"f1":"aaa\nbbb"}
| bbb |
7 | aaa\rbbb | {"f1":"aaa\rbbb"}
7 | aaa bbb | {"f1":"aaa\tbbb"}
(8 rows)

8<--------------------------

This is all independent of my patch for COPY TO. If I am reading that
correctly, everything matches Davin's table *except* the forward slash
("/"). I defer to the experts on the thread to debate that...

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#49

davin@apache.org

about 2 years ago

In reply to: Joe Conway (#48)

Re: Emitting JSON to file using COPY TO

Thanks for the wayback machine link Andrew. I read it, understood it, and
will comply.

Joe, those test cases look great and the outputs are the same as `jq`.

As for forward slashes being escaped, I found this:
https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped
.

Forward slash escaping is optional, so not escaping them in Postgres is
okay. The important thing is that the software _reading_ JSON interprets
both '\/' and '/' as '/'.

#50

mail@joeconway.com

about 2 years ago

In reply to: Davin Shearer (#49)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

On 12/5/23 12:43, Davin Shearer wrote:

Joe, those test cases look great and the outputs are the same as `jq`.

Forward slash escaping is optional, so not escaping them in Postgres is
okay. The important thing is that the software _reading_ JSON
interprets both '\/' and '/' as '/'.

Thanks for the review and info. I modified the existing regression test
thus:

8<--------------------------
create temp table copyjsontest (
id bigserial,
f1 text,
f2 timestamptz);

insert into copyjsontest
select g.i,
CASE WHEN g.i % 2 = 0 THEN
'line with '' in it: ' || g.i::text
ELSE
'line with " in it: ' || g.i::text
END,
'Mon Feb 10 17:32:01 1997 PST'
from generate_series(1,5) as g(i);

insert into copyjsontest (f1) values
(E'aaa\"bbb'::text),
(E'aaa\\bbb'::text),
(E'aaa\/bbb'::text),
(E'aaa\bbbb'::text),
(E'aaa\fbbb'::text),
(E'aaa\nbbb'::text),
(E'aaa\rbbb'::text),
(E'aaa\tbbb'::text);
copy copyjsontest to stdout json;
{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T20:32:01-05:00"}
{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T20:32:01-05:00"}
{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T20:32:01-05:00"}
{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T20:32:01-05:00"}
{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T20:32:01-05:00"}
{"id":1,"f1":"aaa\"bbb","f2":null}
{"id":2,"f1":"aaa\\bbb","f2":null}
{"id":3,"f1":"aaa/bbb","f2":null}
{"id":4,"f1":"aaa\bbbb","f2":null}
{"id":5,"f1":"aaa\fbbb","f2":null}
{"id":6,"f1":"aaa\nbbb","f2":null}
{"id":7,"f1":"aaa\rbbb","f2":null}
{"id":8,"f1":"aaa\tbbb","f2":null}
8<--------------------------

I think the code, documentation, and tests are in pretty good shape at
this point. Latest version attached.

Any other comments or complaints out there?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

copyto_json.005.difftext/x-patch; charset=UTF-8; name=copyto_json.005.diffDownload

Add json format mode to COPY TO

Add json format mode support to COPY TO, which includes three output
variations: 1) "json lines" which is each row as a json object delimited
by newlines (the default); 2) "json lines", except include comma delimiters
between json objects; and 3) "json array" which is the same as #2, but with
the addition of a leading "[" and trailing "]" to form a valid json array.

Early versions: helpful hints/reviews provided by Nathan Bossart,
Tom Lane, and Maciek Sakrejda. Final versions: reviewed by Andrew Dunstan
and Davin Shearer.

Requested-by: Davin Shearer
Author: Joe Conway
Reviewed-by: Andrew Dunstan, Davin Shearer 
Discussion: https://postgr.es/m/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5


diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..af8777b 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,50 ----
      FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+     FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
+     FORCE_ROW_DELIMITER [ <replaceable class="parameter">boolean</replaceable> ]
      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
  </synopsis>
   </refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 208,221 ----
        Selects the data format to be read or written:
        <literal>text</literal>,
        <literal>csv</literal> (Comma Separated Values),
+       <literal>json</literal> (JavaScript Object Notation),
        or <literal>binary</literal>.
        The default is <literal>text</literal>.
       </para>
+      <para>
+       The <literal>json</literal> option is allowed only in
+       <command>COPY TO</command>.
+      </para>
      </listitem>
     </varlistentry>
  
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 379,410 ----
       </para>
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ROW_DELIMITER</literal></term>
+     <listitem>
+      <para>
+       Force output of commas as row delimiters, in addition to the usual
+       end of line characters. This option is allowed only in
+       <command>COPY TO</command>, and only when using
+       <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of array decorations at the beginning and end of output.
+       This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+       option. It is allowed only in <command>COPY TO</command>, and only
+       when using <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
  
     <varlistentry>
      <term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..0236a9e 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,426 ----
  	bool		format_specified = false;
  	bool		freeze_specified = false;
  	bool		header_specified = false;
+ 	bool		force_row_delimiter_specified = false;
+ 	bool		force_array_specified = false;
  	ListCell   *option;
  
  	/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 445,452 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 544,563 ----
  								defel->defname),
  						 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_row_delimiter") == 0)
+ 		{
+ 			if (force_row_delimiter_specified)
+ 				errorConflictingDefElem(defel, pstate);
+ 			force_row_delimiter_specified = true;
+ 			opts_out->force_row_delimiter = defGetBoolean(defel);
+ 		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (force_array_specified)
+ 				errorConflictingDefElem(defel, pstate);
+ 			force_array_specified = true;
+ 			opts_out->force_array = defGetBoolean(defel);
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 616,647 ----
  				(errcode(ERRCODE_SYNTAX_ERROR),
  				 errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use JSON mode in COPY FROM")));
+ 
+ 		if (opts_out->force_array &&
+ 			force_row_delimiter_specified &&
+ 			!opts_out->force_row_delimiter)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true")));
+ 
+ 		if (opts_out->force_array)
+ 			opts_out->force_row_delimiter = true;
+ 	}
+ 	else if (opts_out->force_array)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ 	else if (opts_out->force_row_delimiter)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ROW_DELIMITER requires JSON mode")));
+ 
  	/* Set defaults for omitted options */
  	if (!opts_out->delim)
  		opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 711,721 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..fba3070 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/*
+ 		 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 		 * send the opening bracket.
+ 		 */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			if (cstate->opts.force_array)
+ 			{
+ 				CopySendChar(cstate, '[');
+ 				CopySendEndOfRow(cstate);
+ 			}
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/*
+ 	 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 	 * send the closing bracket.
+ 	 */
+ 	if (cstate->opts.json_mode &&
+ 		cstate->opts.force_array)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 945,1028 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed &&
+ 			cstate->opts.force_row_delimiter)
+ 		{
+ 			CopySendChar(cstate, ',');
+ 		}
+ 		else if (cstate->opts.force_row_delimiter)
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopySendData(cstate, result->data, result->len);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..16aa131 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
  				{
  					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
  				}
+ 			| JSON
+ 				{
+ 					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ 				}
  			| HEADER_P
  				{
  					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3465 ----
  				{
  					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
  				}
+ 			| FORCE ROW DELIMITER
+ 				{
+ 					$$ = makeDefElem("force_row_delimiter", (Node *) makeBoolean(true), @1);
+ 				}
+ 			| FORCE ARRAY
+ 				{
+ 					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ 				}
  		;
  
  /* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3502,3511 ----
  				{
  					$$ = makeDefElem($1, $2, @1);
  				}
+ 			| FORMAT_LA copy_generic_opt_arg
+ 				{
+ 					$$ = makeDefElem("format", $2, @1);
+ 				}
  		;
  
  copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..266910d 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,69 ----
  	List	   *force_null;		/* list of column names */
  	bool		force_null_all; /* FORCE_NULL *? */
  	bool	   *force_null_flags;	/* per-column CSV FN flags */
+ 	bool		force_row_delimiter;	/* use comma as per-row JSON delimiter */
+ 	bool		force_array;	/* JSON array; implies force_row_delimiter */
  	bool		convert_selectively;	/* do selective binary conversion? */
  	List	   *convert_select; /* list of column names (can be NIL) */
  } CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..1a8dde2 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,115 ----
  c1,"col with , comma","col with "" quote"
  1,a,1
  2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_row_delimiter);
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout
+  (format json, force_array true, force_row_delimiter false);
+ ERROR:  cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true
+ -- Error
+ copy copytest to stdout
+  (format json, header);
+ ERROR:  cannot specify HEADER in JSON mode
+ -- embedded escaped characters
+ create temp table copyjsontest (
+     id bigserial,
+     f1 text,
+     f2 timestamptz);
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":1,"f1":"aaa\"bbb","f2":null}
+ {"id":2,"f1":"aaa\\bbb","f2":null}
+ {"id":3,"f1":"aaa/bbb","f2":null}
+ {"id":4,"f1":"aaa\bbbb","f2":null}
+ {"id":5,"f1":"aaa\fbbb","f2":null}
+ {"id":6,"f1":"aaa\nbbb","f2":null}
+ {"id":7,"f1":"aaa\rbbb","f2":null}
+ {"id":8,"f1":"aaa\tbbb","f2":null}
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..aac4ccd 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,104 ----
  
  copy copytest3 to stdout csv header;
  
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ 
+ copy copytest to stdout (format json);
+ 
+ copy copytest to stdout (format json, force_array);
+ 
+ copy copytest to stdout (format json, force_row_delimiter);
+ 
+ -- Error
+ copy copytest to stdout
+  (format json, force_array true, force_row_delimiter false);
+ 
+ -- Error
+ copy copytest to stdout
+  (format json, header);
+ 
+ -- embedded escaped characters
+ create temp table copyjsontest (
+     id bigserial,
+     f1 text,
+     f2 timestamptz);
+ 
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ 
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ 
+ copy copyjsontest to stdout json;
+ 
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);

#51

davin@apache.org

about 2 years ago

In reply to: Joe Conway (#50)

Re: Emitting JSON to file using COPY TO

Hi Joe,

In reviewing the 005 patch, I think that when used with FORCE ARRAY, we
should also _imply_ FORCE ROW DELIMITER. I can't envision a use case where
someone would want to use FORCE ARRAY without also using FORCE ROW
DELIMITER. I can, however, envision a use case where someone would want
FORCE ROW DELIMITER without FORCE ARRAY, like maybe including into a larger
array. I definitely appreciate these options and the flexibility that they
afford from a user perspective.

In the test output, will you also show the different variations with FORCE
ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false), (false,
true), (true, true)}? Technically you've already shown me the (false,
false) case as those are the defaults.

Thanks!

#52

andrew@dunslane.net

about 2 years ago

In reply to: Davin Shearer (#51)

Re: Emitting JSON to file using COPY TO

On 2023-12-05 Tu 14:50, Davin Shearer wrote:

Hi Joe,

In reviewing the 005 patch, I think that when used with FORCE ARRAY,
we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
case where someone would want to use FORCE ARRAY without also using
FORCE ROW DELIMITER. I can, however, envision a use case where
someone would want FORCE ROW DELIMITER without FORCE ARRAY, like maybe
including into a larger array. I definitely appreciate these options
and the flexibility that they afford from a user perspective.

In the test output, will you also show the different variations with
FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false),
(false, true), (true, true)}? Technically you've already shown me the
(false, false) case as those are the defaults.

I don't understand the point of FORCE_ROW_DELIMITER at all. There is
only one legal delimiter of array items in JSON, and that's a comma.
There's no alternative and it's not optional. So in the array case you
MUST have commas and in any other case (e.g. LINES) I can't see why you
would have them.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#53

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#52)

Re: Emitting JSON to file using COPY TO

On 12/5/23 15:55, Andrew Dunstan wrote:

On 2023-12-05 Tu 14:50, Davin Shearer wrote:

Hi Joe,

In reviewing the 005 patch, I think that when used with FORCE ARRAY,
we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
case where someone would want to use FORCE ARRAY without also using
FORCE ROW DELIMITER. I can, however, envision a use case where
someone would want FORCE ROW DELIMITER without FORCE ARRAY, like maybe
including into a larger array. I definitely appreciate these options
and the flexibility that they afford from a user perspective.

In the test output, will you also show the different variations with
FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false),
(false, true), (true, true)}? Technically you've already shown me the
(false, false) case as those are the defaults.

I don't understand the point of FORCE_ROW_DELIMITER at all. There is
only one legal delimiter of array items in JSON, and that's a comma.
There's no alternative and it's not optional. So in the array case you
MUST have commas and in any other case (e.g. LINES) I can't see why you
would have them.

The current patch already *does* imply row delimiters in the array case. 
It says so here:
8<---------------------------
+    <varlistentry>
+     <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of array decorations at the beginning and end of 
output.
+       This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+       option. It is allowed only in <command>COPY TO</command>, and only
+       when using <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
8<---------------------------

and it does so here:
8<---------------------------
+         if (opts_out->force_array)
+             opts_out->force_row_delimiter = true;
8<---------------------------

and it shows that here:
8<---------------------------
+ copy copytest to stdout (format json, force_array);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
8<---------------------------

It also does not allow explicitly setting row delimiters false while
force_array is true here:
8<---------------------------

+ 		if (opts_out->force_array &&
+ 			force_row_delimiter_specified &&
+ 			!opts_out->force_row_delimiter)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot specify FORCE_ROW_DELIMITER false with 
FORCE_ARRAY true")));
8<---------------------------

Am I understanding something incorrectly?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#54

[1]: https://learn.microsoft.com/en-us/sql/relational-databases/json/remove-square-brackets-from-json-without-array-wrapper-option?view=sql-server-ver16#example-multiple-row-result
https://learn.microsoft.com/en-us/sql/relational-databases/json/remove-square-brackets-from-json-without-array-wrapper-option?view=sql-server-ver16#example-multiple-row-result

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#53)

Re: Emitting JSON to file using COPY TO

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.

Oh I didn't address this -- I saw examples in the interwebs of MSSQL
server I think [1]https://learn.microsoft.com/en-us/sql/relational-databases/json/remove-square-brackets-from-json-without-array-wrapper-option?view=sql-server-ver16#example-multiple-row-result which had the non-array with commas import and export
style. It was not that tough to support and the code as written already
does it, so why not?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#55

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#53)

Re: Emitting JSON to file using COPY TO

On 2023-12-05 Tu 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

On 2023-12-05 Tu 14:50, Davin Shearer wrote:

Hi Joe,

In reviewing the 005 patch, I think that when used with FORCE ARRAY,
we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
case where someone would want to use FORCE ARRAY without also using
FORCE ROW DELIMITER. I can, however, envision a use case where
someone would want FORCE ROW DELIMITER without FORCE ARRAY, like
maybe including into a larger array. I definitely appreciate these
options and the flexibility that they afford from a user perspective.

In the test output, will you also show the different variations with
FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true,
false), (false, true), (true, true)}? Technically you've already
shown me the (false, false) case as those are the defaults.

I don't understand the point of FORCE_ROW_DELIMITER at all. There is
only one legal delimiter of array items in JSON, and that's a comma.
There's no alternative and it's not optional. So in the array case you
MUST have commas and in any other case (e.g. LINES) I can't see why you
would have them.
The current patch already *does* imply row delimiters in the array 
case. It says so here:
8<---------------------------
+    <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of array decorations at the beginning and end of 
output.
+       This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+       option. It is allowed only in <command>COPY TO</command>, and 
only
+       when using <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
8<---------------------------
and it does so here:
8<---------------------------
+         if (opts_out->force_array)
+             opts_out->force_row_delimiter = true;
8<---------------------------
and it shows that here:
8<---------------------------
+ copy copytest to stdout (format json, force_array);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
8<---------------------------
It also does not allow explicitly setting row delimiters false while
force_array is true here:
8<---------------------------
+         if (opts_out->force_array &&
+             force_row_delimiter_specified &&
+             !opts_out->force_row_delimiter)
+             ereport(ERROR,
+                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                      errmsg("cannot specify FORCE_ROW_DELIMITER 
false with FORCE_ARRAY true")));
8<---------------------------
Am I understanding something incorrectly?

But what's the point of having it if you're not using FORCE_ARRAY?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#56

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#55)

Re: Emitting JSON to file using COPY TO

On 12/5/23 16:12, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:02, Joe Conway wrote:
On 12/5/23 15:55, Andrew Dunstan wrote:

On 2023-12-05 Tu 14:50, Davin Shearer wrote:

Hi Joe,

In reviewing the 005 patch, I think that when used with FORCE ARRAY,
we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
case where someone would want to use FORCE ARRAY without also using
FORCE ROW DELIMITER. I can, however, envision a use case where
someone would want FORCE ROW DELIMITER without FORCE ARRAY, like
maybe including into a larger array. I definitely appreciate these
options and the flexibility that they afford from a user perspective.

In the test output, will you also show the different variations with
FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true,
false), (false, true), (true, true)}? Technically you've already
shown me the (false, false) case as those are the defaults.

I don't understand the point of FORCE_ROW_DELIMITER at all. There is
only one legal delimiter of array items in JSON, and that's a comma.
There's no alternative and it's not optional. So in the array case you
MUST have commas and in any other case (e.g. LINES) I can't see why you
would have them.
The current patch already *does* imply row delimiters in the array 
case. It says so here:
8<---------------------------
+    <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of array decorations at the beginning and end of 
output.
+       This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+       option. It is allowed only in <command>COPY TO</command>, and 
only
+       when using <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
8<---------------------------
and it does so here:
8<---------------------------
+         if (opts_out->force_array)
+             opts_out->force_row_delimiter = true;
8<---------------------------
and it shows that here:
8<---------------------------
+ copy copytest to stdout (format json, force_array);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
8<---------------------------
It also does not allow explicitly setting row delimiters false while
force_array is true here:
8<---------------------------
+         if (opts_out->force_array &&
+             force_row_delimiter_specified &&
+             !opts_out->force_row_delimiter)
+             ereport(ERROR,
+                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                      errmsg("cannot specify FORCE_ROW_DELIMITER 
false with FORCE_ARRAY true")));
8<---------------------------
Am I understanding something incorrectly?
But what's the point of having it if you're not using FORCE_ARRAY?

See the follow up email -- other databases support it so why not? It
seems to be a thing...

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#57

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#54)

Re: Emitting JSON to file using COPY TO

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.

Oh I didn't address this -- I saw examples in the interwebs of MSSQL
server I think [1] which had the non-array with commas import and
export style. It was not that tough to support and the code as written
already does it, so why not?

[1]
https://learn.microsoft.com/en-us/sql/relational-databases/json/remove-square-brackets-from-json-without-array-wrapper-option?view=sql-server-ver16#example-multiple-row-result

That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#58

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#57)

Re: Emitting JSON to file using COPY TO

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.

Oh I didn't address this -- I saw examples in the interwebs of MSSQL
server I think [1] which had the non-array with commas import and
export style. It was not that tough to support and the code as written
already does it, so why not?

That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

So your preference would be to not allow the non-array-with-commas case
but if/when we implement COPY FROM we would accept that format? As in
Postel'a law ("be conservative in what you do, be liberal in what you
accept from others")?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#59

davin@apache.org

about 2 years ago

In reply to: Joe Conway (#58)

Re: Emitting JSON to file using COPY TO

Am I understanding something incorrectly?

No, you've got it. You already covered the concerns there.

That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

For what it's worth, I agree with Andrew on this. I also agree with COPY
FROM allowing for potentially bogus commas at the end of non-arrays for
interop with other products, but to not do that in COPY TO (unless there is
some real compelling case to do so). Emitting bogus JSON (non-array with
commas) feels wrong and would be nice to not perpetuate that, if possible.

Thanks again for doing this. If I can be of any help, let me know.
If\When this makes it into the production product, I'll be using this
feature for sure.

-Davin

#60

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#58)

Re: Emitting JSON to file using COPY TO

On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.

Oh I didn't address this -- I saw examples in the interwebs of MSSQL
server I think [1] which had the non-array with commas import and
export style. It was not that tough to support and the code as
written already does it, so why not?

That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

So your preference would be to not allow the non-array-with-commas
case but if/when we implement COPY FROM we would accept that format?
As in Postel'a law ("be conservative in what you do, be liberal in
what you accept from others")?

Yes, I think so.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#61

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#60)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

On 12/6/23 07:36, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.

Oh I didn't address this -- I saw examples in the interwebs of MSSQL
server I think [1] which had the non-array with commas import and
export style. It was not that tough to support and the code as
written already does it, so why not?

That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

So your preference would be to not allow the non-array-with-commas
case but if/when we implement COPY FROM we would accept that format?
As in Postel'a law ("be conservative in what you do, be liberal in
what you accept from others")?

Yes, I think so.

Awesome. The attached does it that way. I also ran pgindent.

I believe this is ready to commit unless there are further comments or
objections.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

copyto_json.006.difftext/x-patch; charset=UTF-8; name=copyto_json.006.diffDownload

Add json format mode to COPY TO

Add json format mode support to COPY TO, which includes two output
variations: 1) "json lines" which is each row as a json object delimited
by newlines (the default); and 2) "json array" which is the same as #1,
but with the addition of a leading "[", trailing "]", and comma row
delimiters, to form a valid json array.

Early versions: helpful hints/reviews provided by Nathan Bossart,
Tom Lane, and Maciek Sakrejda. Final versions: reviewed by Andrew Dunstan
and Davin Shearer.

Requested-by: Davin Shearer
Author: Joe Conway
Reviewed-by: Andrew Dunstan, Davin Shearer 
Discussion: https://postgr.es/m/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..8915fb3 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,49 ----
      FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+     FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
  </synopsis>
   </refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 207,220 ----
        Selects the data format to be read or written:
        <literal>text</literal>,
        <literal>csv</literal> (Comma Separated Values),
+       <literal>json</literal> (JavaScript Object Notation),
        or <literal>binary</literal>.
        The default is <literal>text</literal>.
       </para>
+      <para>
+       The <literal>json</literal> option is allowed only in
+       <command>COPY TO</command>.
+      </para>
      </listitem>
     </varlistentry>
  
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 378,396 ----
       </para>
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of square brackets as array decorations at the beginning
+       and end of output, and commas between the rows. It is allowed only in
+       <command>COPY TO</command>, and only when using
+       <literal>JSON</literal> format. The default is
+       <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
  
     <varlistentry>
      <term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..23b570f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,425 ----
  	bool		format_specified = false;
  	bool		freeze_specified = false;
  	bool		header_specified = false;
+ 	bool		force_array_specified = false;
  	ListCell   *option;
  
  	/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 444,451 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 543,555 ----
  								defel->defname),
  						 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (force_array_specified)
+ 				errorConflictingDefElem(defel, pstate);
+ 			force_array_specified = true;
+ 			opts_out->force_array = defGetBoolean(defel);
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 608,625 ----
  				(errcode(ERRCODE_SYNTAX_ERROR),
  				 errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use JSON mode in COPY FROM")));
+ 	}
+ 	else if (opts_out->force_array)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ 
  	/* Set defaults for omitted options */
  	if (!opts_out->delim)
  		opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 689,699 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..6e351ec 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/*
+ 		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+ 		 * the opening bracket.
+ 		 */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			if (cstate->opts.force_array)
+ 			{
+ 				CopySendChar(cstate, '[');
+ 				CopySendEndOfRow(cstate);
+ 			}
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/*
+ 	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ 	 * closing bracket.
+ 	 */
+ 	if (cstate->opts.json_mode &&
+ 		cstate->opts.force_array)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 945,1028 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed &&
+ 			cstate->opts.force_array)
+ 		{
+ 			CopySendChar(cstate, ',');
+ 		}
+ 		else if (cstate->opts.force_array)
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopySendData(cstate, result->data, result->len);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..e6789d7 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
  				{
  					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
  				}
+ 			| JSON
+ 				{
+ 					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ 				}
  			| HEADER_P
  				{
  					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3461 ----
  				{
  					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
  				}
+ 			| FORCE ARRAY
+ 				{
+ 					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ 				}
  		;
  
  /* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3498,3507 ----
  				{
  					$$ = makeDefElem($1, $2, @1);
  				}
+ 			| FORMAT_LA copy_generic_opt_arg
+ 				{
+ 					$$ = makeDefElem("format", $2, @1);
+ 				}
  		;
  
  copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..97899b6 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,68 ----
  	List	   *force_null;		/* list of column names */
  	bool		force_null_all; /* FORCE_NULL *? */
  	bool	   *force_null_flags;	/* per-column CSV FN flags */
+ 	bool		force_array;	/* add JSON array decorations */
  	bool		convert_selectively;	/* do selective binary conversion? */
  	List	   *convert_select; /* list of column names (can be NIL) */
  } CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..31913f6 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,117 ----
  c1,"col with , comma","col with "" quote"
  1,a,1
  2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array true);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array false);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout (format json, header);
+ ERROR:  cannot specify HEADER in JSON mode
+ -- embedded escaped characters
+ create temp table copyjsontest (
+     id bigserial,
+     f1 text,
+     f2 timestamptz);
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":1,"f1":"aaa\"bbb","f2":null}
+ {"id":2,"f1":"aaa\\bbb","f2":null}
+ {"id":3,"f1":"aaa/bbb","f2":null}
+ {"id":4,"f1":"aaa\bbbb","f2":null}
+ {"id":5,"f1":"aaa\fbbb","f2":null}
+ {"id":6,"f1":"aaa\nbbb","f2":null}
+ {"id":7,"f1":"aaa\rbbb","f2":null}
+ {"id":8,"f1":"aaa\tbbb","f2":null}
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..4b76541 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,101 ----
  
  copy copytest3 to stdout csv header;
  
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ 
+ copy copytest to stdout (format json);
+ 
+ copy copytest to stdout (format json, force_array);
+ 
+ copy copytest to stdout (format json, force_array true);
+ 
+ copy copytest to stdout (format json, force_array false);
+ 
+ -- Error
+ copy copytest to stdout (format json, header);
+ 
+ -- embedded escaped characters
+ create temp table copyjsontest (
+     id bigserial,
+     f1 text,
+     f2 timestamptz);
+ 
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ 
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ 
+ copy copyjsontest to stdout json;
+ 
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);

#62

andrew@dunslane.net

about 2 years ago

In reply to: Joe Conway (#61)

Re: Emitting JSON to file using COPY TO

On 2023-12-06 We 08:49, Joe Conway wrote:

On 12/6/23 07:36, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.

Oh I didn't address this -- I saw examples in the interwebs of
MSSQL server I think [1] which had the non-array with commas
import and export style. It was not that tough to support and the
code as written already does it, so why not?

That seems quite absurd, TBH. I know we've catered for some
absurdity in
the CSV code (much of it down to me), so maybe we need to be
liberal in
what we accept here too. IMNSHO, we should produce either a single
JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

So your preference would be to not allow the non-array-with-commas
case but if/when we implement COPY FROM we would accept that format?
As in Postel'a law ("be conservative in what you do, be liberal in
what you accept from others")?

Yes, I think so.

Awesome. The attached does it that way. I also ran pgindent.

I believe this is ready to commit unless there are further comments or
objections.

Sorry to bikeshed a little more, I'm a bit late looking at this.

I suspect that most users will actually want the table as a single JSON
document, so it should probably be the default. In any case FORCE_ARRAY
as an option has a slightly wrong feel to it. I'm having trouble coming
up with a good name for the reverse of that, off the top of my head.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#63

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Joe Conway (#61)

Re: Emitting JSON to file using COPY TO

Joe Conway <mail@joeconway.com> writes:

I believe this is ready to commit unless there are further comments or
objections.

I thought we were still mostly at proof-of-concept stage?

In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.

regards, tom lane

#64

mail@joeconway.com

about 2 years ago

In reply to: Andrew Dunstan (#62)

Re: Emitting JSON to file using COPY TO

On 12/6/23 10:32, Andrew Dunstan wrote:

On 2023-12-06 We 08:49, Joe Conway wrote:

On 12/6/23 07:36, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.

Oh I didn't address this -- I saw examples in the interwebs of
MSSQL server I think [1] which had the non-array with commas
import and export style. It was not that tough to support and the
code as written already does it, so why not?

That seems quite absurd, TBH. I know we've catered for some
absurdity in
the CSV code (much of it down to me), so maybe we need to be
liberal in
what we accept here too. IMNSHO, we should produce either a single
JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

So your preference would be to not allow the non-array-with-commas
case but if/when we implement COPY FROM we would accept that format?
As in Postel'a law ("be conservative in what you do, be liberal in
what you accept from others")?

Yes, I think so.

Awesome. The attached does it that way. I also ran pgindent.

I believe this is ready to commit unless there are further comments or
objections.

Sorry to bikeshed a little more, I'm a bit late looking at this.

I suspect that most users will actually want the table as a single JSON
document, so it should probably be the default. In any case FORCE_ARRAY
as an option has a slightly wrong feel to it.

Sure, I can make that happen, although I figured that for the
many-rows-scenario the single array size might be an issue for whatever
you are importing into.

I'm having trouble coming up with a good name for the reverse of
that, off the top of my head.

Will think about it and propose something with the next patch revision.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#65

mail@joeconway.com

about 2 years ago

In reply to: Tom Lane (#63)

Re: Emitting JSON to file using COPY TO

On 12/6/23 10:44, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

I believe this is ready to commit unless there are further comments or
objections.

I thought we were still mostly at proof-of-concept stage?

The concept is narrowly scoped enough that I think we are homing in on
the final patch.

In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.

I will devise some kind of test and report back. I suppose something
with many rows and many narrow columns comparing time to COPY
text/csv/json modes would do the trick?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#66

andrew@dunslane.net

about 2 years ago

In reply to: Tom Lane (#63)

Re: Emitting JSON to file using COPY TO

On 2023-12-06 We 10:44, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

I believe this is ready to commit unless there are further comments or
objections.

I thought we were still mostly at proof-of-concept stage?

In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.

Yeah, that's hard to deal with, too, as it can be called recursively.

OTOH I'd rather have a version of this that worked slowly than none at all.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#67

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Joe Conway (#65)

Re: Emitting JSON to file using COPY TO

Joe Conway <mail@joeconway.com> writes:

On 12/6/23 10:44, Tom Lane wrote:

In particular, has anyone done any performance testing?

I will devise some kind of test and report back. I suppose something
with many rows and many narrow columns comparing time to COPY
text/csv/json modes would do the trick?

Yeah. If it's at least in the same ballpark as the existing text/csv
formats then I'm okay with it. I'm worried that it might be 10x worse,
in which case I think we'd need to do something.

regards, tom lane

#68

Sehrope Sarkuni

sehrope@jackdb.com

about 2 years ago

In reply to: Andrew Dunstan (#66)

Re: Emitting JSON to file using COPY TO

Big +1 to this overall feature.

This is something I've wanted for a long time as well. While it's possible
to use a COPY with text output for a trivial case, the double escaping
falls apart quickly for arbitrary data. It's really only usable when you
know exactly what you are querying and know it will not be a problem.

Regarding the defaults for the output, I think JSON lines (rather than a
JSON array of objects) would be preferred. It's more natural to combine
them and generate that type of data on the fly rather than forcing
aggregation into a single object.

Couple more features / use cases come to mind as well. Even if they're not
part of a first round of this feature I think it'd be helpful to document
them now as it might give some ideas for what does make that first cut:

1. Outputting a top level JSON object without the additional column keys.
IIUC, the top level keys are always the column names. A common use case
would be a single json/jsonb column that is already formatted exactly as
the user would like for output. Rather than enveloping it in an object with
a dedicated key, it would be nice to be able to output it directly. This
would allow non-object results to be outputted as well (e.g., lines of JSON
arrays, numbers, or strings). Due to how JSON is structured, I think this
would play nice with the JSON lines v.s. array concept.

COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo":2}
{"foo":3}

2. An option to ignore null fields so they are excluded from the output.
This would not be a default but would allow shrinking the total size of the
output data in many situations. This would be recursive to allow nested
objects to be shrunk down (not just the top level). This might be
worthwhile as a standalone JSON function though handling it during output
would be more efficient as it'd only be read once.

COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
{}
{"foo":2}
{"foo":3}

3. Reverse of #2 when copying data in to allow defaulting missing fields to
NULL.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

#69

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Andrew Dunstan (#66)

Re: Emitting JSON to file using COPY TO

Andrew Dunstan <andrew@dunslane.net> writes:

On 2023-12-06 We 10:44, Tom Lane wrote:

In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.

Yeah, that's hard to deal with, too, as it can be called recursively.

Right. On the plus side, if we did improve this it would presumably
also benefit other callers of composite_to_json[b].

OTOH I'd rather have a version of this that worked slowly than none at all.

It might be acceptable to plan on improving the performance later,
depending on just how bad it is now.

regards, tom lane

#70

nathandbossart@gmail.com

about 2 years ago

In reply to: Tom Lane (#69)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 06, 2023 at 11:28:59AM -0500, Tom Lane wrote:

It might be acceptable to plan on improving the performance later,
depending on just how bad it is now.

On 10M rows with 11 integers each, I'm seeing the following:

(format text)
Time: 10056.311 ms (00:10.056)
Time: 8789.331 ms (00:08.789)
Time: 8755.070 ms (00:08.755)

(format csv)
Time: 12295.480 ms (00:12.295)
Time: 12311.059 ms (00:12.311)
Time: 12305.469 ms (00:12.305)

(format json)
Time: 24568.621 ms (00:24.569)
Time: 23756.234 ms (00:23.756)
Time: 24265.730 ms (00:24.266)

'perf top' tends to look a bit like this:

13.31% postgres [.] appendStringInfoString
7.57% postgres [.] datum_to_json_internal
6.82% postgres [.] SearchCatCache1
5.35% [kernel] [k] intel_gpio_irq
3.57% postgres [.] composite_to_json
3.31% postgres [.] IsValidJsonNumber

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#71

nathandbossart@gmail.com

about 2 years ago

In reply to: Nathan Bossart (#70)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 06, 2023 at 10:33:49AM -0600, Nathan Bossart wrote:

(format csv)
Time: 12295.480 ms (00:12.295)
Time: 12311.059 ms (00:12.311)
Time: 12305.469 ms (00:12.305)

(format json)
Time: 24568.621 ms (00:24.569)
Time: 23756.234 ms (00:23.756)
Time: 24265.730 ms (00:24.266)

I should also note that the json output is 85% larger than the csv output.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#72

[1]: https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY

daniel@manitou-mail.org

about 2 years ago

In reply to: Andrew Dunstan (#57)

Re: Emitting JSON to file using COPY TO

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?

At least the json non-ARRAY case ("json lines") doesn't have
this issue, since every CopyData message corresponds effectively
to a row in the table.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#73

mail@joeconway.com

about 2 years ago

In reply to: Daniel Verite (#72)

Re: Emitting JSON to file using COPY TO

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?

Is it a real problem, or just a bit of documentation change that I missed?

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#74

mail@joeconway.com

about 2 years ago

In reply to: Nathan Bossart (#71)

Re: Emitting JSON to file using COPY TO

On 12/6/23 11:44, Nathan Bossart wrote:

On Wed, Dec 06, 2023 at 10:33:49AM -0600, Nathan Bossart wrote:

(format csv)
Time: 12295.480 ms (00:12.295)
Time: 12311.059 ms (00:12.311)
Time: 12305.469 ms (00:12.305)

(format json)
Time: 24568.621 ms (00:24.569)
Time: 23756.234 ms (00:23.756)
Time: 24265.730 ms (00:24.266)

I should also note that the json output is 85% larger than the csv output.

I'll see if I can add some caching to composite_to_json(), but based on
the relative data size it does not sound like there is much performance
left on the table to go after, no?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#75

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Joe Conway (#74)

Re: Emitting JSON to file using COPY TO

Joe Conway <mail@joeconway.com> writes:

I'll see if I can add some caching to composite_to_json(), but based on
the relative data size it does not sound like there is much performance
left on the table to go after, no?

If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching. The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.

The output size difference does say that maybe we should pay some
attention to the nearby request to not always label every field.
Perhaps there should be an option for each row to transform to
a JSON array rather than an object?

regards, tom lane

#76

andrew@dunslane.net

about 2 years ago

In reply to: Tom Lane (#75)

Re: Emitting JSON to file using COPY TO

On 2023-12-06 We 15:20, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

I'll see if I can add some caching to composite_to_json(), but based on
the relative data size it does not sound like there is much performance
left on the table to go after, no?

If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching. The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.

The output size difference does say that maybe we should pay some
attention to the nearby request to not always label every field.
Perhaps there should be an option for each row to transform to
a JSON array rather than an object?

I doubt it. People who want this are likely to want pretty much what
this patch is providing, not something they would have to transform in
order to get it. If they want space-efficient data they won't really be
wanting JSON. Maybe they want Protocol Buffers or something in that vein.

I see there's nearby proposal to make this area pluggable at
</messages/by-id/20231204.153548.2126325458835528809.kou@clear-code.com>

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#77

mail@joeconway.com

about 2 years ago

In reply to: Sehrope Sarkuni (#68)

Re: Emitting JSON to file using COPY TO

On 12/6/23 11:28, Sehrope Sarkuni wrote:

Big +1 to this overall feature.

cool!

Regarding the defaults for the output, I think JSON lines (rather than a
JSON array of objects) would be preferred. It's more natural to combine
them and generate that type of data on the fly rather than forcing
aggregation into a single object.

So that is +2 (Sehrope and me) for the status quo (JSON lines), and +2
(Andrew and Davin) for defaulting to json arrays. Anyone else want to
weigh in on that issue?

Couple more features / use cases come to mind as well. Even if they're
not part of a first round of this feature I think it'd be helpful to
document them now as it might give some ideas for what does make that
first cut:

1. Outputting a top level JSON object without the additional column
keys. IIUC, the top level keys are always the column names. A common use
case would be a single json/jsonb column that is already formatted
exactly as the user would like for output. Rather than enveloping it in
an object with a dedicated key, it would be nice to be able to output it
directly. This would allow non-object results to be outputted as well
(e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is
structured, I think this would play nice with the JSON lines v.s. array
concept.

COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo":2}
{"foo":3}

Your example does not match what you describe, or do I misunderstand? I
thought your goal was to eliminate the repeated "foo" from each row...

2. An option to ignore null fields so they are excluded from the output.
This would not be a default but would allow shrinking the total size of
the output data in many situations. This would be recursive to allow
nested objects to be shrunk down (not just the top level). This might be
worthwhile as a standalone JSON function though handling it during
output would be more efficient as it'd only be read once.

COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
{}
{"foo":2}
{"foo":3}

clear enough I think

3. Reverse of #2 when copying data in to allow defaulting missing fields
to NULL.

good to record the ask, but applies to a different feature (COPY FROM
instead of COPY TO).

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#78

Sehrope Sarkuni

sehrope@jackdb.com

about 2 years ago

In reply to: Andrew Dunstan (#76)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:03 PM Andrew Dunstan <andrew@dunslane.net> wrote:

The output size difference does say that maybe we should pay some
attention to the nearby request to not always label every field.
Perhaps there should be an option for each row to transform to
a JSON array rather than an object?

I doubt it. People who want this are likely to want pretty much what
this patch is providing, not something they would have to transform in
order to get it. If they want space-efficient data they won't really be
wanting JSON. Maybe they want Protocol Buffers or something in that vein.

For arrays v.s. objects, it's not just about data size. There are plenty of
situations where a JSON array is superior to an object (e.g. duplicate
column names). Lines of JSON arrays of strings is pretty much CSV with JSON
escaping rules and a pair of wrapping brackets. It's common for tabular
data in node.js environments as you don't need a separate CSV parser.

Each one has its place and a default of the row_to_json(...) representation
of the row still makes sense. But if the user has the option of outputting
a single json/jsonb field for each row without an object or array wrapper,
then it's possible to support all of these use cases as the user can
explicitly pick whatever envelope makes sense:

-- Lines of JSON arrays:
COPY (SELECT json_build_array('test-' || a, b) FROM generate_series(1, 3)
a, generate_series(5,6) b) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_DISABLE_ENVELOPE);
["test-1", 5]
["test-2", 5]
["test-3", 5]
["test-1", 6]
["test-2", 6]
["test-3", 6]

-- Lines of JSON strings:
COPY (SELECT to_json('test-' || x) FROM generate_series(1, 5) x) TO STDOUT
WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE);
"test-1"
"test-2"
"test-3"
"test-4"
"test-5"

I'm not sure how I feel about the behavior being automatic if it's a single
top level json / jsonb field rather than requiring the explicit option.
It's probably what a user would want but it also feels odd to change the
output wrapper automatically based on the fields in the response. If it is
automatic and the user wants the additional envelope, the option always
exists to wrap it further in another: json_build_object('some_field",
my_field_i_want_wrapped)

The duplicate field names would be a good test case too. I haven't gone
through this patch but I'm guessing it doesn't filter out duplicates so the
behavior would match up with row_to_json(...), i.e. duplicates are
preserved:

=> SELECT row_to_json(t.*) FROM (SELECT 1 AS a, 2 AS a) t;
row_to_json
---------------
{"a":1,"a":2}

If so, that's a good test case to add as however that's handled should be
deterministic.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

#79

nathandbossart@gmail.com

about 2 years ago

In reply to: Tom Lane (#75)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote:

If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching. The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.

Agreed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#80

Sehrope Sarkuni

sehrope@jackdb.com

about 2 years ago

In reply to: Joe Conway (#77)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:29 PM Joe Conway <mail@joeconway.com> wrote:

1. Outputting a top level JSON object without the additional column
keys. IIUC, the top level keys are always the column names. A common use
case would be a single json/jsonb column that is already formatted
exactly as the user would like for output. Rather than enveloping it in
an object with a dedicated key, it would be nice to be able to output it
directly. This would allow non-object results to be outputted as well
(e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is
structured, I think this would play nice with the JSON lines v.s. array
concept.

COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo":2}
{"foo":3}

Your example does not match what you describe, or do I misunderstand? I
thought your goal was to eliminate the repeated "foo" from each row...

The "foo" in this case is explicit as I'm adding it when building the
object. What I was trying to show was not adding an additional object
wrapper / envelope.

So each row is:

{"foo":1}

Rather than:

"{"json_build_object":{"foo":1}}

If each row has exactly one json / jsonb field, then the user has already
indicated the format for each row.

That same mechanism can be used to remove the "foo" entirely via a
json/jsonb array.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

#81

mail@joeconway.com

about 2 years ago

In reply to: Sehrope Sarkuni (#80)

Re: Emitting JSON to file using COPY TO

On 12/6/23 16:42, Sehrope Sarkuni wrote:

On Wed, Dec 6, 2023 at 4:29 PM Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> wrote:

1. Outputting a top level JSON object without the additional column
keys. IIUC, the top level keys are always the column names. A

common use

case would be a single json/jsonb column that is already formatted
exactly as the user would like for output. Rather than enveloping

it in

an object with a dedicated key, it would be nice to be able to

output it

directly. This would allow non-object results to be outputted as

well

(e.g., lines of JSON arrays, numbers, or strings). Due to how

JSON is

structured, I think this would play nice with the JSON lines v.s.

array

concept.

COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo":2}
{"foo":3}

Your example does not match what you describe, or do I misunderstand? I
thought your goal was to eliminate the repeated "foo" from each row...

The "foo" in this case is explicit as I'm adding it when building the
object. What I was trying to show was not adding an additional object
wrapper / envelope.

So each row is:

{"foo":1}

Rather than:

"{"json_build_object":{"foo":1}}

I am still getting confused ;-)

Let's focus on the current proposed patch with a "minimum required
feature set".

Right now the default behavior is "JSON lines":
8<-------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON);
{"i":1,"v":"val1"}
{"i":2,"v":"val2"}
{"i":3,"v":"val3"}
8<-------------------------------

and the other, non-default option is "JSON array":
8<-------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON, FORCE_ARRAY);
[
{"i":1,"v":"val1"}
,{"i":2,"v":"val2"}
,{"i":3,"v":"val3"}
]
8<-------------------------------

So the questions are:
1. Do those two formats work for the initial implementation?
2. Is the default correct or should it be switched
e.g. rather than specifying FORCE_ARRAY to get an
array, something like FORCE_NO_ARRAY to get JSON lines
and the JSON array is default?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#82

david.g.johnston@gmail.com

about 2 years ago

In reply to: Joe Conway (#81)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 3:38 PM Joe Conway <mail@joeconway.com> wrote:

So the questions are:
1. Do those two formats work for the initial implementation?

Yes. We provide a stream-oriented format and one atomic-import format.

2. Is the default correct or should it be switched

e.g. rather than specifying FORCE_ARRAY to get an
array, something like FORCE_NO_ARRAY to get JSON lines
and the JSON array is default?

No default?

Require explicit of a sub-format when the main format is JSON.

JSON_OBJECT_ROWS
JSON_ARRAY_OF_OBJECTS

For a future compact array-structured-composites sub-format:
JSON_ARRAY_OF_ARRAYS
JSON_ARRAY_ROWS

David J.

#83

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#73)

Re: Emitting JSON to file using COPY TO

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?

Is it a real problem, or just a bit of documentation change that I missed?

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#84

david.g.johnston@gmail.com

about 2 years ago

In reply to: Joe Conway (#83)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:09 PM Joe Conway <mail@joeconway.com> wrote:

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?

What is the limitation, if any, of introducing new type codes for these. n
= 2..N for the different variants? Or even -1 for "raw text"? And
document that columns and structural rows need to be determined
out-of-band. Continuing to use 1 (text) for this non-csv data seems like a
hack even if we can technically make it function. The semantics,
especially for the array case, are completely discarded or wrong.

David J.

#85

david.g.johnston@gmail.com

about 2 years ago

In reply to: David G. Johnston (#84)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:28 PM David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Wed, Dec 6, 2023 at 4:09 PM Joe Conway <mail@joeconway.com> wrote:

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?

What is the limitation, if any, of introducing new type codes for these.
n = 2..N for the different variants? Or even -1 for "raw text"? And
document that columns and structural rows need to be determined
out-of-band. Continuing to use 1 (text) for this non-csv data seems like a
hack even if we can technically make it function. The semantics,
especially for the array case, are completely discarded or wrong.

Also, it seems like this answer would be easier to make if we implement
COPY FROM now since how is the server supposed to deal with decomposing
this data into tables without accurate type information? I don't see
implementing only half of the feature being a good idea. I've had much
more desire for FROM compared to TO personally.

David J.

#86

mail@joeconway.com

about 2 years ago

In reply to: David G. Johnston (#84)

Re: Emitting JSON to file using COPY TO

On 12/6/23 18:28, David G. Johnston wrote:

On Wed, Dec 6, 2023 at 4:09 PM Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> wrote:

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one

per row

(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend,

followed

by zero or more CopyData messages (always one per row),

followed by

CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)

Anything receiving this and looking for a json array should know

how to

assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?

What is the limitation, if any, of introducing new type codes for
these. n = 2..N for the different variants? Or even -1 for "raw
text"? And document that columns and structural rows need to be
determined out-of-band. Continuing to use 1 (text) for this non-csv
data seems like a hack even if we can technically make it function. The
semantics, especially for the array case, are completely discarded or wrong.

I am not following you here. SendCopyBegin looks like this currently:

8<--------------------------------
SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
int16 format = (cstate->opts.binary ? 1 : 0);
int i;

pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
pq_sendint16(&buf, natts);
for (i = 0; i < natts; i++)
pq_sendint16(&buf, format); /* per-column formats */
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
8<--------------------------------

The "1" is saying are we binary mode or not. JSON mode will never be
sending in binary in the current implementation at least. And it always
aggregates all the columns as one json object. So the correct answer is
(I think):
8<--------------------------------
*************** SendCopyBegin(CopyToState cstate)
*** 146,154 ****

   	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
   	pq_sendbyte(&buf, format);	/* overall format */
! 	pq_sendint16(&buf, natts);
! 	for (i = 0; i < natts; i++)
! 		pq_sendint16(&buf, format); /* per-column formats */
   	pq_endmessage(&buf);
   	cstate->copy_dest = COPY_FRONTEND;
   }
--- 150,169 ----

pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
! if (!cstate->opts.json_mode)
! {
! pq_sendint16(&buf, natts);
! for (i = 0; i < natts; i++)
! pq_sendint16(&buf, format); /* per-column formats */
! }
! else
! {
! /*
! * JSON mode is always one non-binary column
! */
! pq_sendint16(&buf, 1);
! pq_sendint16(&buf, 0);
! }
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
8<--------------------------------

That still leaves the need to fix the documentation:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone"

probably "always one per row" would be changed to note that json array
format outputs two extra rows for the start/end bracket.

In fact, as written the patch does this:
8<--------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
generate_series(1, 3) x(i) WHERE false)
TO STDOUT WITH (FORMAT JSON, FORCE_ARRAY);
[
]
8<--------------------------------

Not sure if that is a problem or not.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#87

mail@joeconway.com

about 2 years ago

In reply to: David G. Johnston (#85)

Re: Emitting JSON to file using COPY TO

On 12/6/23 18:38, David G. Johnston wrote:

On Wed, Dec 6, 2023 at 4:28 PM David G. Johnston
<david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:

On Wed, Dec 6, 2023 at 4:09 PM Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> wrote:

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents,

one per row

(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the

frontend, followed

by zero or more CopyData messages (always one per row),

followed by

CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)

Anything receiving this and looking for a json array should

know how to

assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal
"3" for
the json mode case -- that should really say "1" I think,
because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe
those
are less likely to cause some hazard?

What is the limitation, if any, of introducing new type codes for
these. n = 2..N for the different variants? Or even -1 for "raw
text"? And document that columns and structural rows need to be
determined out-of-band. Continuing to use 1 (text) for this non-csv
data seems like a hack even if we can technically make it function.
The semantics, especially for the array case, are completely
discarded or wrong.

Also, it seems like this answer would be easier to make if we implement
COPY FROM now since how is the server supposed to deal with decomposing
this data into tables without accurate type information? I don't see
implementing only half of the feature being a good idea. I've had much
more desire for FROM compared to TO personally.

Several people have weighed in on the side of getting COPY TO done by
itself first. Given how long this discussion has already become for a
relatively small and simple feature, I am a big fan of not expanding the
scope now.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#88

david.g.johnston@gmail.com

about 2 years ago

In reply to: Joe Conway (#86)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:45 PM Joe Conway <mail@joeconway.com> wrote:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone"

probably "always one per row" would be changed to note that json array
format outputs two extra rows for the start/end bracket.

Fair, I was ascribing much more semantic meaning to this than it wants.

I don't see any real requirement, given the lack of semantics, to mention
JSON at all. It is one CopyData per row, regardless of the contents. We
don't delineate between the header and non-header data in CSV. It isn't a
protocol concern.

But I still cannot shake the belief that using a format code of 1 - which
really could be interpreted as meaning "textual csv" in practice - for this
JSON output is unwise and we should introduce a new integer value for the
new fundamental output format.

David J.

#89

mail@joeconway.com

about 2 years ago

In reply to: David G. Johnston (#88)

Re: Emitting JSON to file using COPY TO

On 12/6/23 19:39, David G. Johnston wrote:

On Wed, Dec 6, 2023 at 4:45 PM Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> wrote:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone"

probably "always one per row" would be changed to note that json array
format outputs two extra rows for the start/end bracket.

Fair, I was ascribing much more semantic meaning to this than it wants.

I don't see any real requirement, given the lack of semantics, to
mention JSON at all. It is one CopyData per row, regardless of the
contents. We don't delineate between the header and non-header data in
CSV. It isn't a protocol concern.

good point

But I still cannot shake the belief that using a format code of 1 -
which really could be interpreted as meaning "textual csv" in practice -
for this JSON output is unwise and we should introduce a new integer
value for the new fundamental output format.

No, I am pretty sure you still have that wrong. The "1" means binary
mode. As in
8<----------------------
FORMAT

Selects the data format to be read or written: text, csv (Comma
Separated Values), or binary. The default is text.
8<----------------------

That is completely separate from text and csv. It literally means to use
the binary output functions instead of the usual ones:

8<----------------------
if (cstate->opts.binary)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
else
getTypeOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
8<----------------------

Both "text" and "csv" mode use are non-binary output formats. I believe
the JSON output format is also non-binary.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#90

david.g.johnston@gmail.com

about 2 years ago

In reply to: Joe Conway (#89)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 5:57 PM Joe Conway <mail@joeconway.com> wrote:

On 12/6/23 19:39, David G. Johnston wrote:

On Wed, Dec 6, 2023 at 4:45 PM Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> wrote:

But I still cannot shake the belief that using a format code of 1 -
which really could be interpreted as meaning "textual csv" in practice -
for this JSON output is unwise and we should introduce a new integer
value for the new fundamental output format.

No, I am pretty sure you still have that wrong. The "1" means binary
mode

Ok. I made the same typo twice, I did mean to write 0 instead of 1. But
the point that we should introduce a 2 still stands. The new code would
mean: use text output functions but that there is no inherent tabular
structure in the underlying contents. Instead the copy format was JSON and
the output layout is dependent upon the json options in the copy command
and that there really shouldn't be any attempt to turn the contents
directly into a tabular data structure like you presently do with the CSV
data under format 0. Ignore the column count and column formats as they
are fixed or non-existent.

David J.

#91

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#83)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

On 12/6/23 18:09, Joe Conway wrote:

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?

Is it a real problem, or just a bit of documentation change that I missed?

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?

The attached should fix the CopyOut response to say one column. I.e. it
ought to look something like:

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 1
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data: [...]

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

copyto_json.007.difftext/x-patch; charset=UTF-8; name=copyto_json.007.diffDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..8915fb3 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,49 ----
      FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+     FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
  </synopsis>
   </refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 207,220 ----
        Selects the data format to be read or written:
        <literal>text</literal>,
        <literal>csv</literal> (Comma Separated Values),
+       <literal>json</literal> (JavaScript Object Notation),
        or <literal>binary</literal>.
        The default is <literal>text</literal>.
       </para>
+      <para>
+       The <literal>json</literal> option is allowed only in
+       <command>COPY TO</command>.
+      </para>
      </listitem>
     </varlistentry>
  
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 378,396 ----
       </para>
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of square brackets as array decorations at the beginning
+       and end of output, and commas between the rows. It is allowed only in
+       <command>COPY TO</command>, and only when using
+       <literal>JSON</literal> format. The default is
+       <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
  
     <varlistentry>
      <term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..23b570f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,425 ----
  	bool		format_specified = false;
  	bool		freeze_specified = false;
  	bool		header_specified = false;
+ 	bool		force_array_specified = false;
  	ListCell   *option;
  
  	/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 444,451 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 543,555 ----
  								defel->defname),
  						 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (force_array_specified)
+ 				errorConflictingDefElem(defel, pstate);
+ 			force_array_specified = true;
+ 			opts_out->force_array = defGetBoolean(defel);
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 608,625 ----
  				(errcode(ERRCODE_SYNTAX_ERROR),
  				 errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use JSON mode in COPY FROM")));
+ 	}
+ 	else if (opts_out->force_array)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ 
  	/* Set defaults for omitted options */
  	if (!opts_out->delim)
  		opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 689,699 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..e068229 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 28,33 ****
--- 28,34 ----
  #include "executor/execdesc.h"
  #include "executor/executor.h"
  #include "executor/tuptable.h"
+ #include "funcapi.h"
  #include "libpq/libpq.h"
  #include "libpq/pqformat.h"
  #include "mb/pg_wchar.h"
***************
*** 37,42 ****
--- 38,44 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 114,121 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** SendCopyBegin(CopyToState cstate)
*** 146,154 ****
  
  	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
  	pq_sendbyte(&buf, format);	/* overall format */
! 	pq_sendint16(&buf, natts);
! 	for (i = 0; i < natts; i++)
! 		pq_sendint16(&buf, format); /* per-column formats */
  	pq_endmessage(&buf);
  	cstate->copy_dest = COPY_FRONTEND;
  }
--- 150,169 ----
  
  	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
  	pq_sendbyte(&buf, format);	/* overall format */
! 	if (!cstate->opts.json_mode)
! 	{
! 		pq_sendint16(&buf, natts);
! 		for (i = 0; i < natts; i++)
! 			pq_sendint16(&buf, format); /* per-column formats */
! 	}
! 	else
! 	{
! 		/*
! 		 * JSON mode is always one non-binary column
! 		 */
! 		pq_sendint16(&buf, 1);
! 		pq_sendint16(&buf, 0);
! 	}
  	pq_endmessage(&buf);
  	cstate->copy_dest = COPY_FRONTEND;
  }
*************** DoCopyTo(CopyToState cstate)
*** 759,764 ****
--- 774,781 ----
  		tupDesc = RelationGetDescr(cstate->rel);
  	else
  		tupDesc = cstate->queryDesc->tupDesc;
+ 	BlessTupleDesc(tupDesc);
+ 
  	num_phys_attrs = tupDesc->natts;
  	cstate->opts.null_print_client = cstate->opts.null_print;	/* default */
  
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 862,881 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/*
+ 		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+ 		 * the opening bracket.
+ 		 */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			if (cstate->opts.force_array)
+ 			{
+ 				CopySendChar(cstate, '[');
+ 				CopySendEndOfRow(cstate);
+ 			}
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 923,939 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/*
+ 	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ 	 * closing bracket.
+ 	 */
+ 	if (cstate->opts.json_mode &&
+ 		cstate->opts.force_array)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 948,954 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 959,1042 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed &&
+ 			cstate->opts.force_array)
+ 		{
+ 			CopySendChar(cstate, ',');
+ 		}
+ 		else if (cstate->opts.force_array)
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopySendData(cstate, result->data, result->len);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..e6789d7 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
  				{
  					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
  				}
+ 			| JSON
+ 				{
+ 					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ 				}
  			| HEADER_P
  				{
  					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3461 ----
  				{
  					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
  				}
+ 			| FORCE ARRAY
+ 				{
+ 					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ 				}
  		;
  
  /* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3498,3507 ----
  				{
  					$$ = makeDefElem($1, $2, @1);
  				}
+ 			| FORMAT_LA copy_generic_opt_arg
+ 				{
+ 					$$ = makeDefElem("format", $2, @1);
+ 				}
  		;
  
  copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..97899b6 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,68 ----
  	List	   *force_null;		/* list of column names */
  	bool		force_null_all; /* FORCE_NULL *? */
  	bool	   *force_null_flags;	/* per-column CSV FN flags */
+ 	bool		force_array;	/* add JSON array decorations */
  	bool		convert_selectively;	/* do selective binary conversion? */
  	List	   *convert_select; /* list of column names (can be NIL) */
  } CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..31913f6 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,117 ----
  c1,"col with , comma","col with "" quote"
  1,a,1
  2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array true);
+ [
+  {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array false);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout (format json, header);
+ ERROR:  cannot specify HEADER in JSON mode
+ -- embedded escaped characters
+ create temp table copyjsontest (
+     id bigserial,
+     f1 text,
+     f2 timestamptz);
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":1,"f1":"aaa\"bbb","f2":null}
+ {"id":2,"f1":"aaa\\bbb","f2":null}
+ {"id":3,"f1":"aaa/bbb","f2":null}
+ {"id":4,"f1":"aaa\bbbb","f2":null}
+ {"id":5,"f1":"aaa\fbbb","f2":null}
+ {"id":6,"f1":"aaa\nbbb","f2":null}
+ {"id":7,"f1":"aaa\rbbb","f2":null}
+ {"id":8,"f1":"aaa\tbbb","f2":null}
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..4b76541 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,101 ----
  
  copy copytest3 to stdout csv header;
  
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ 
+ copy copytest to stdout (format json);
+ 
+ copy copytest to stdout (format json, force_array);
+ 
+ copy copytest to stdout (format json, force_array true);
+ 
+ copy copytest to stdout (format json, force_array false);
+ 
+ -- Error
+ copy copytest to stdout (format json, header);
+ 
+ -- embedded escaped characters
+ create temp table copyjsontest (
+     id bigserial,
+     f1 text,
+     f2 timestamptz);
+ 
+ insert into copyjsontest
+   select g.i,
+          CASE WHEN g.i % 2 = 0 THEN
+            'line with '' in it: ' || g.i::text
+          ELSE
+            'line with " in it: ' || g.i::text
+          END,
+          'Mon Feb 10 17:32:01 1997 PST'
+   from generate_series(1,5) as g(i);
+ 
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ 
+ copy copyjsontest to stdout json;
+ 
  create temp table copytest4 (
  	c1 int,
  	"colname with tab: 	" text);

#92

mail@joeconway.com

about 2 years ago

In reply to: David G. Johnston (#90)

Re: Emitting JSON to file using COPY TO

On 12/6/23 20:09, David G. Johnston wrote:

On Wed, Dec 6, 2023 at 5:57 PM Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> wrote:

On 12/6/23 19:39, David G. Johnston wrote:

On Wed, Dec 6, 2023 at 4:45 PM Joe Conway <mail@joeconway.com

<mailto:mail@joeconway.com>

<mailto:mail@joeconway.com <mailto:mail@joeconway.com>>> wrote:

But I still cannot shake the belief that using a format code of 1 -
which really could be interpreted as meaning "textual csv" in

practice -

for this JSON output is unwise and we should introduce a new integer
value for the new fundamental output format.

No, I am pretty sure you still have that wrong. The "1" means binary
mode

Ok. I made the same typo twice, I did mean to write 0 instead of 1.

Fair enough.

But the point that we should introduce a 2 still stands. The new code
would mean: use text output functions but that there is no inherent
tabular structure in the underlying contents. Instead the copy format
was JSON and the output layout is dependent upon the json options in the
copy command and that there really shouldn't be any attempt to turn the
contents directly into a tabular data structure like you presently do
with the CSV data under format 0. Ignore the column count and column
formats as they are fixed or non-existent.

I think that amounts to a protocol change, which we tend to avoid at all
costs.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#93

david.g.johnston@gmail.com

about 2 years ago

In reply to: Joe Conway (#92)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 6:14 PM Joe Conway <mail@joeconway.com> wrote:

But the point that we should introduce a 2 still stands. The new code
would mean: use text output functions but that there is no inherent
tabular structure in the underlying contents. Instead the copy format
was JSON and the output layout is dependent upon the json options in the
copy command and that there really shouldn't be any attempt to turn the
contents directly into a tabular data structure like you presently do
with the CSV data under format 0. Ignore the column count and column
formats as they are fixed or non-existent.

I think that amounts to a protocol change, which we tend to avoid at all
costs.

I wasn't sure on that point but figured it might be the case. It is a
value change, not structural, which seems like it is the kind of
modification any living system might allow and be expected to have. But I
also don't see any known problem with the current change of content
semantics without the format identification change. Most of the relevant
context ends up out-of-band in the copy command itself.

David J.

#94

Euler Taveira

euler@eulerto.com

about 2 years ago

In reply to: Daniel Verite (#72)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023, at 3:59 PM, Daniel Verite wrote:

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?

At least the json non-ARRAY case ("json lines") doesn't have
this issue, since every CopyData message corresponds effectively
to a row in the table.

Moreover, if your interface wants to process the COPY data stream while
receiving it, you cannot provide "json array" format because each row (plus all
of the received ones) is not a valid JSON. Hence, a JSON parser cannot be
executed until you receive the whole data set. (wal2json format 1 has this
disadvantage. Format 2 was born to provide a better alternative -- each row is
a valid JSON.) I'm not saying that "json array" is not useful but that for
large data sets, it is less useful.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#95

nathandbossart@gmail.com

about 2 years ago

In reply to: Tom Lane (#75)

Re: Emitting JSON to file using COPY TO

On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote:

If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching. The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.

Hah, it turns out my benchmark of 110M integers really stresses the
JSONTYPE_NUMERIC path in datum_to_json_internal(). That particular path
calls strlen() twice: once for IsValidJsonNumber(), and once in
appendStringInfoString(). If I save the result from IsValidJsonNumber()
and give it to appendBinaryStringInfo() instead, the COPY goes ~8% faster.
It's probably worth giving datum_to_json_internal() a closer look in a new
thread.

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53ff97..1951e93d9d 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -180,6 +180,7 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result,
 {
     char       *outputstr;
     text       *jsontext;
+    int         len;

check_stack_depth();

@@ -223,8 +224,8 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result,
              * Don't call escape_json for a non-key if it's a valid JSON
              * number.
              */
-            if (!key_scalar && IsValidJsonNumber(outputstr, strlen(outputstr)))
-                appendStringInfoString(result, outputstr);
+            if (!key_scalar && IsValidJsonNumber(outputstr, (len = strlen(outputstr))))
+                appendBinaryStringInfo(result, outputstr, len);
             else
                 escape_json(result, outputstr);
             pfree(outputstr);

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#96

mail@joeconway.com

about 2 years ago

In reply to: Nathan Bossart (#95)

Re: Emitting JSON to file using COPY TO

On 12/6/23 21:56, Nathan Bossart wrote:

On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote:

If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching. The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.

Hah, it turns out my benchmark of 110M integers really stresses the
JSONTYPE_NUMERIC path in datum_to_json_internal(). That particular path
calls strlen() twice: once for IsValidJsonNumber(), and once in
appendStringInfoString(). If I save the result from IsValidJsonNumber()
and give it to appendBinaryStringInfo() instead, the COPY goes ~8% faster.
It's probably worth giving datum_to_json_internal() a closer look in a new
thread.

Yep, after looking through that code I was going to make the point that
your 11 integer test was over indexing on that one type. I am sure there
are other micro-optimizations to be made here, but I also think that it
is outside the scope of the COPY TO JSON patch.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#97

andrew@dunslane.net

about 2 years ago

In reply to: David G. Johnston (#82)

Re: Emitting JSON to file using COPY TO

On 2023-12-06 We 17:56, David G. Johnston wrote:

On Wed, Dec 6, 2023 at 3:38 PM Joe Conway <mail@joeconway.com> wrote:

So the questions are:
1. Do those two formats work for the initial implementation?

Yes. We provide a stream-oriented format and one atomic-import format.

2. Is the default correct or should it be switched
e.g. rather than specifying FORCE_ARRAY to get an
array, something like FORCE_NO_ARRAY to get JSON lines
and the JSON array is default?

No default?

Require explicit of a sub-format when the main format is JSON.

JSON_OBJECT_ROWS
JSON_ARRAY_OF_OBJECTS

For a future compact array-structured-composites sub-format:
JSON_ARRAY_OF_ARRAYS
JSON_ARRAY_ROWS

No default seems unlike the way we treat other COPY options. I'm not
terribly fussed about which format to have as the default, but I think
we should have one.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#98

daniel@manitou-mail.org

about 2 years ago

In reply to: Joe Conway (#91)

Re: Emitting JSON to file using COPY TO

Joe Conway wrote:

The attached should fix the CopyOut response to say one column. I.e. it
ought to look something like:

Spending more time with the doc I came to the opinion that in this bit
of the protocol, in CopyOutResponse (B)
...
Int16
The number of columns in the data to be copied (denoted N below).
...

this number must be the number of columns in the source.
That is for COPY table(a,b,c) the number is 3, independently
on whether the result is formatted in text, cvs, json or binary.

I think that changing it for json can reasonably be interpreted
as a protocol break and we should not do it.

The fact that this value does not help parsing the CopyData
messages that come next is not a new issue. A reader that
doesn't know the field separator and whether it's text or csv
cannot parse these messages into fields anyway.
But just knowing how much columns there are in the original
data might be useful by itself and we don't want to break that.

The other question for me is, in the CopyData message, this
bit:
" Messages sent from the backend will always correspond to single data rows"

ISTM that considering that the "[" starting the json array is a
"data row" is a stretch.
That might be interpreted as a protocol break, depending
on how strict the interpretation is.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#99

david.g.johnston@gmail.com

about 2 years ago

In reply to: Daniel Verite (#98)

Re: Emitting JSON to file using COPY TO

On Thursday, December 7, 2023, Daniel Verite <daniel@manitou-mail.org>
wrote:

Joe Conway wrote:

The attached should fix the CopyOut response to say one column. I.e. it
ought to look something like:

Spending more time with the doc I came to the opinion that in this bit
of the protocol, in CopyOutResponse (B)
...
Int16
The number of columns in the data to be copied (denoted N below).
...

this number must be the number of columns in the source.
That is for COPY table(a,b,c) the number is 3, independently
on whether the result is formatted in text, cvs, json or binary.

I think that changing it for json can reasonably be interpreted
as a protocol break and we should not do it.

The fact that this value does not help parsing the CopyData
messages that come next is not a new issue. A reader that
doesn't know the field separator and whether it's text or csv
cannot parse these messages into fields anyway.
But just knowing how much columns there are in the original
data might be useful by itself and we don't want to break that.

This argument for leaving 3 as the column count makes sense to me. I agree
this content is not meant to facilitate interpreting the contents at a
protocol level.

The other question for me is, in the CopyData message, this
bit:
" Messages sent from the backend will always correspond to single data
rows"

ISTM that considering that the "[" starting the json array is a
"data row" is a stretch.
That might be interpreted as a protocol break, depending
on how strict the interpretation is.

We already effectively interpret this as “one content line per copydata
message” in the csv text with header line case. I’d probably reword it to
state that explicitly and then we again don’t have to worry about the
protocol caring about any data semantics of the underlying content, only
physical semantics.

David J.

#100

mail@joeconway.com

about 2 years ago

In reply to: Daniel Verite (#98)

Re: Emitting JSON to file using COPY TO

On 12/7/23 08:35, Daniel Verite wrote:

Joe Conway wrote:

The attached should fix the CopyOut response to say one column. I.e. it
ought to look something like:

Spending more time with the doc I came to the opinion that in this bit
of the protocol, in CopyOutResponse (B)
...
Int16
The number of columns in the data to be copied (denoted N below).
...

this number must be the number of columns in the source.
That is for COPY table(a,b,c) the number is 3, independently
on whether the result is formatted in text, cvs, json or binary.

I think that changing it for json can reasonably be interpreted
as a protocol break and we should not do it.

The fact that this value does not help parsing the CopyData
messages that come next is not a new issue. A reader that
doesn't know the field separator and whether it's text or csv
cannot parse these messages into fields anyway.
But just knowing how much columns there are in the original
data might be useful by itself and we don't want to break that.

Ok, that sounds reasonable to me -- I will revert that change.

The other question for me is, in the CopyData message, this
bit:
" Messages sent from the backend will always correspond to single data rows"

ISTM that considering that the "[" starting the json array is a
"data row" is a stretch.
That might be interpreted as a protocol break, depending
on how strict the interpretation is.

If we really think that is a problem I can see about changing it to this
format for json array:

8<------------------
copy
(
with ss(f1, f2) as
(
select 1, g.i from generate_series(1, 3) g(i)
)
select ss from ss
) to stdout (format json, force_array);
[{"ss":{"f1":1,"f2":1}}
,{"ss":{"f1":1,"f2":2}}
,{"ss":{"f1":1,"f2":3}}]
8<------------------

Is this acceptable to everyone?

Or maybe this is preferred?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

Or as long as we are painting the shed, maybe this?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#101

mail@joeconway.com

about 2 years ago

In reply to: Joe Conway (#100)

Re: Emitting JSON to file using COPY TO

On 12/7/23 08:52, Joe Conway wrote:

Or maybe this is preferred?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

I don't know why my mail client keeps adding extra spaces, but the
intention here is a single space in front of row 2 and 3 in order to
line the json objects up at column 2.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#102

david.g.johnston@gmail.com

about 2 years ago

In reply to: Joe Conway (#101)

Re: Emitting JSON to file using COPY TO

On Thursday, December 7, 2023, Joe Conway <mail@joeconway.com> wrote:

On 12/7/23 08:35, Daniel Verite wrote:

Joe Conway wrote:

The attached should fix the CopyOut response to say one column. I.e. it

ought to look something like:

Spending more time with the doc I came to the opinion that in this bit
of the protocol, in CopyOutResponse (B)
...
Int16
The number of columns in the data to be copied (denoted N below).
...

this number must be the number of columns in the source.
That is for COPY table(a,b,c) the number is 3, independently
on whether the result is formatted in text, cvs, json or binary.

I think that changing it for json can reasonably be interpreted
as a protocol break and we should not do it.

The fact that this value does not help parsing the CopyData
messages that come next is not a new issue. A reader that
doesn't know the field separator and whether it's text or csv
cannot parse these messages into fields anyway.
But just knowing how much columns there are in the original
data might be useful by itself and we don't want to break that.

Ok, that sounds reasonable to me -- I will revert that change.

The other question for me is, in the CopyData message, this

bit:
" Messages sent from the backend will always correspond to single data
rows"

ISTM that considering that the "[" starting the json array is a
"data row" is a stretch.
That might be interpreted as a protocol break, depending
on how strict the interpretation is.

If we really think that is a problem I can see about changing it to this
format for json array:

8<------------------
copy
(
with ss(f1, f2) as
(
select 1, g.i from generate_series(1, 3) g(i)
)
select ss from ss
) to stdout (format json, force_array);
[{"ss":{"f1":1,"f2":1}}
,{"ss":{"f1":1,"f2":2}}
,{"ss":{"f1":1,"f2":3}}]
8<------------------

Is this acceptable to everyone?

Or maybe this is preferred?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

Or as long as we are painting the shed, maybe this?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

Those are all the same breakage though - if truly interpreted as data rows
the protocol is basically written such that the array format is not
supportable and only the lines format can be used. Hence my “format 0
doesn’t work” comment for array output and we should explicitly add format
2 where we explicitly decouple lines of output from rows of data. That
said, it would seem in practice format 0 already decouples them and so the
current choice of the brackets on their own lines is acceptable.

I’d prefer to keep them on their own line.

I also don’t know why you introduced another level of object nesting here.
That seems quite undesirable.

David J.

#103

mail@joeconway.com

about 2 years ago

In reply to: David G. Johnston (#102)

Re: Emitting JSON to file using COPY TO

On 12/7/23 09:11, David G. Johnston wrote:

Those are all the same breakage though - if truly interpreted as data
rows the protocol is basically written such that the array format is not
supportable and only the lines format can be used. Hence my “format 0
doesn’t work” comment for array output and we should explicitly add
format 2 where we explicitly decouple lines of output from rows of
data. That said, it would seem in practice format 0 already decouples
them and so the current choice of the brackets on their own lines is
acceptable.

I’d prefer to keep them on their own line.

WFM ¯\_(ツ)_/¯

I am merely responding with options to the many people opining on the
thread.

I also don’t know why you introduced another level of object nesting
here. That seems quite undesirable.

I didn't add anything. It is an artifact of the particular query I wrote
in the copy to statement (I did "select ss from ss" instead of "select *
from ss"), mea culpa.

This is what the latest patch, as written today, outputs:
8<----------------------
copy
(select 1, g.i from generate_series(1, 3) g(i))
to stdout (format json, force_array);
[
{"?column?":1,"i":1}
,{"?column?":1,"i":2}
,{"?column?":1,"i":3}
]
8<----------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#104

Dave Cramer

davecramer@postgres.rocks

about 2 years ago

In reply to: David G. Johnston (#99)

Re: Emitting JSON to file using COPY TO

On Thu, 7 Dec 2023 at 08:47, David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Thursday, December 7, 2023, Daniel Verite <daniel@manitou-mail.org>
wrote:

Joe Conway wrote:

The attached should fix the CopyOut response to say one column. I.e. it
ought to look something like:

Spending more time with the doc I came to the opinion that in this bit
of the protocol, in CopyOutResponse (B)
...
Int16
The number of columns in the data to be copied (denoted N below).
...

this number must be the number of columns in the source.
That is for COPY table(a,b,c) the number is 3, independently
on whether the result is formatted in text, cvs, json or binary.

I think that changing it for json can reasonably be interpreted
as a protocol break and we should not do it.

The fact that this value does not help parsing the CopyData
messages that come next is not a new issue. A reader that
doesn't know the field separator and whether it's text or csv
cannot parse these messages into fields anyway.
But just knowing how much columns there are in the original
data might be useful by itself and we don't want to break that.

This argument for leaving 3 as the column count makes sense to me. I
agree this content is not meant to facilitate interpreting the contents at
a protocol level.

I'd disagree. From my POV if the data comes back as a JSON Array this is
one object and this should be reflected in the column count.

The other question for me is, in the CopyData message, this
bit:
" Messages sent from the backend will always correspond to single data
rows"

ISTM that considering that the "[" starting the json array is a
"data row" is a stretch.
That might be interpreted as a protocol break, depending
on how strict the interpretation is.

Well technically it is a single row if you send an array.

Regardless, I expect Euler's comment above that JSON lines format is going
to be the preferred format as the client doesn't have to wait for the
entire object before starting to parse.

Dave

Show quoted text

#105

daniel@manitou-mail.org

about 2 years ago

In reply to: Joe Conway (#91)

Re: Emitting JSON to file using COPY TO

Joe Conway wrote:

copyto_json.007.diff

When the source has json fields with non-significant line feeds, the COPY
output has these line feeds too, which makes the output incompatible
with rule #2 at https://jsonlines.org ("2. Each Line is a Valid JSON
Value").

create table j(f json);

insert into j values('{"a":1,
"b":2
}');

copy j to stdout (format json);

Result:
{"f":{"a":1,
"b":2
}}

Is that expected? copy.sgml in 007 doesn't describe the output
in terms of lines so it's hard to tell from the doc.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#106

daniel@manitou-mail.org

about 2 years ago

In reply to: Dave Cramer (#104)

Re: Emitting JSON to file using COPY TO

Dave Cramer wrote:

This argument for leaving 3 as the column count makes sense to me. I
agree this content is not meant to facilitate interpreting the contents at
a protocol level.

I'd disagree. From my POV if the data comes back as a JSON Array this is
one object and this should be reflected in the column count.

The doc says this:
"Int16
The number of columns in the data to be copied (denoted N below)."

and this formulation is repeated in PQnfields() for libpq:

"PQnfields
Returns the number of columns (fields) to be copied."

How to interpret that sentence?
"to be copied" from what, into what, and by what way?

A plausible interpretation is "to be copied from the source data
into the COPY stream, by the backend". So the number of columns
to be copied likely refers to the columns of the dataset, not the
"in-transit form" that is text or csv or json.

The interpetation you're proposing also makes sense, that it's just
one json column per row, or even a single-row single-column for the
entire dataset in the force_array case, but then the question is why
isn't that number of columns always 1 for the original "text" format,
since each row is represented in the stream as a single long piece of
text?

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#107

[*] https://www.postgresql.org/docs/current/sql-createforeigndatawrapper.html

mail@joeconway.com

about 2 years ago

In reply to: Daniel Verite (#105)

Re: Emitting JSON to file using COPY TO

On 12/8/23 14:45, Daniel Verite wrote:

Joe Conway wrote:

copyto_json.007.diff

When the source has json fields with non-significant line feeds, the COPY
output has these line feeds too, which makes the output incompatible
with rule #2 at https://jsonlines.org ("2. Each Line is a Valid JSON
Value").

create table j(f json);

insert into j values('{"a":1,
"b":2
}');

copy j to stdout (format json);

Result:
{"f":{"a":1,
"b":2
}}

Is that expected? copy.sgml in 007 doesn't describe the output
in terms of lines so it's hard to tell from the doc.

The patch as-is just does the equivalent of row_to_json():
8<----------------------------
select row_to_json(j) from j;
row_to_json
--------------
{"f":{"a":1,+
"b":2 +
}}
(1 row)
8<----------------------------

So yeah, that is how it works today. I will take a look at what it would
take to fix it.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#108

Hannu Krosing

hannuk@google.com

about 2 years ago

In reply to: Tom Lane (#20)

Re: Emitting JSON to file using COPY TO

On Sat, Dec 2, 2023 at 4:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joe Conway <mail@joeconway.com> writes:

I noticed that, with the PoC patch, "json" is the only format that must be
quoted. Without quotes, I see a syntax error.

In longer term we should move any specific COPY flag names and values
out of grammar and their checking into the parts that actually
implement whatever the flag is influencing

Similar to what we do with OPTIONS in all levels of FDW definitions
(WRAPPER itself, SERVER, USER MAPPING, FOREIGN TABLE)

I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

While I've not looked too closely, I suspect this might be due to the
FORMAT_LA hack in base_yylex:

/* Replace FORMAT by FORMAT_LA if it's followed by JSON */
switch (next_token)
{
case JSON:
cur_token = FORMAT_LA;
break;
}

My hope is that turning the WITH into a fully independent part with no
grammar-defined keys or values would also solve the issue of quoting
"json".

For backwards compatibility we may even go the route of keeping the
WITH as is but add the OPTIONS which can take any values at grammar
level.

I shared my "Pluggable Copy " talk slides from Berlin '22 in another thread

--
Hannu

#109

Dean Rasheed

dean.a.rasheed@gmail.com

about 2 years ago

In reply to: Joe Conway (#91)

Re: Emitting JSON to file using COPY TO

On Thu, 7 Dec 2023 at 01:10, Joe Conway <mail@joeconway.com> wrote:

The attached should fix the CopyOut response to say one column.

Playing around with this, I found a couple of cases that generate an error:

COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);

COPY (VALUES (1), (2)) TO stdout WITH (format json);

both of those generate the following:

ERROR: record type has not been registered

Regards,
Dean

#110

mail@joeconway.com

about 2 years ago

In reply to: Dean Rasheed (#109)

Re: Emitting JSON to file using COPY TO

On 1/8/24 14:36, Dean Rasheed wrote:

On Thu, 7 Dec 2023 at 01:10, Joe Conway <mail@joeconway.com> wrote:

The attached should fix the CopyOut response to say one column.

Playing around with this, I found a couple of cases that generate an error:

COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);

COPY (VALUES (1), (2)) TO stdout WITH (format json);

both of those generate the following:

ERROR: record type has not been registered

Thanks -- will have a look

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#111

jian.universality@gmail.com

almost 2 years ago

In reply to: Joe Conway (#110)

3 attachment(s)

Re: Emitting JSON to file using COPY TO

On Tue, Jan 9, 2024 at 4:40 AM Joe Conway <mail@joeconway.com> wrote:

On 1/8/24 14:36, Dean Rasheed wrote:

On Thu, 7 Dec 2023 at 01:10, Joe Conway <mail@joeconway.com> wrote:

The attached should fix the CopyOut response to say one column.

Playing around with this, I found a couple of cases that generate an error:

COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);

COPY (VALUES (1), (2)) TO stdout WITH (format json);

both of those generate the following:

ERROR: record type has not been registered

Thanks -- will have a look

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In the function CopyOneRowTo, I try to call the function BlessTupleDesc again.

+BlessTupleDesc(slot->tts_tupleDescriptor);
rowdata = ExecFetchSlotHeapTupleDatum(slot);

Please check the attachment. (one test.sql file, one patch, one bless twice).

Now the error cases are gone, less cases return error.
but the new result is not the expected.

`COPY (SELECT g from generate_series(1,1) g) TO stdout WITH (format json);`
returns
{"":1}
The expected result would be `{"g":1}`.

I think the reason is maybe related to the function copy_dest_startup.

#112

jian.universality@gmail.com

almost 2 years ago

In reply to: jian he (#111)

Re: Emitting JSON to file using COPY TO

On Tue, Jan 16, 2024 at 11:46 AM jian he <jian.universality@gmail.com> wrote:

I think the reason is maybe related to the function copy_dest_startup.

I was wrong about this sentence.

in the function CopyOneRowTo `if (!cstate->opts.json_mode)` else branch
change to the following:

else
{
Datum rowdata;
StringInfo result;
if (slot->tts_tupleDescriptor->natts == 1)
{
/* Flat-copy the attribute array */
memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
TupleDescAttr(cstate->queryDesc->tupDesc, 0),
1 * sizeof(FormData_pg_attribute));
}
BlessTupleDesc(slot->tts_tupleDescriptor);
rowdata = ExecFetchSlotHeapTupleDatum(slot);
result = makeStringInfo();
composite_to_json(rowdata, result, false);
if (json_row_delim_needed &&
cstate->opts.force_array)
{
CopySendChar(cstate, ',');
}
else if (cstate->opts.force_array)
{
/* first row needs no delimiter */
CopySendChar(cstate, ' ');
json_row_delim_needed = true;
}
CopySendData(cstate, result->data, result->len);
}

all the cases work, more like a hack.
because I cannot fully explain it to you why it works.
-------------------------------------------------------------------------------
demo

drop function if exists execute_into_test cascade;
NOTICE: function execute_into_test() does not exist, skipping
DROP FUNCTION
drop type if exists execute_into_test cascade;
NOTICE: type "execute_into_test" does not exist, skipping
DROP TYPE
create type eitype as (i integer, y integer);
CREATE TYPE
create or replace function execute_into_test() returns eitype as $$
declare
_v eitype;
begin
execute 'select 1,2' into _v;
return _v;
end; $$ language plpgsql;
CREATE FUNCTION

COPY (SELECT 1 from generate_series(1,1) g) TO stdout WITH (format json);
{"?column?":1}
COPY (SELECT g from generate_series(1,1) g) TO stdout WITH (format json);
{"g":1}
COPY (SELECT g,1 from generate_series(1,1) g) TO stdout WITH (format json);
{"g":1,"?column?":1}
COPY (select * from execute_into_test()) TO stdout WITH (format json);
{"i":1,"y":2}
COPY (select * from execute_into_test() sub) TO stdout WITH (format json);
{"i":1,"y":2}
COPY (select sub from execute_into_test() sub) TO stdout WITH (format json);
{"sub":{"i":1,"y":2}}
COPY (select sub.i from execute_into_test() sub) TO stdout WITH (format json);
{"i":1}
COPY (select sub.y from execute_into_test() sub) TO stdout WITH (format json);
{"y":2}
COPY (VALUES (1), (2)) TO stdout WITH (format json);
{"column1":1}
{"column1":2}
COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);
{"?column?":1}
{"?column?":2}

#113

Masahiko Sawada

sawada.mshk@gmail.com

almost 2 years ago

In reply to: Joe Conway (#91)

Re: Emitting JSON to file using COPY TO

On Thu, Dec 7, 2023 at 10:10 AM Joe Conway <mail@joeconway.com> wrote:

On 12/6/23 18:09, Joe Conway wrote:

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?

Is it a real problem, or just a bit of documentation change that I missed?

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?

The attached should fix the CopyOut response to say one column. I.e. it
ought to look something like:

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 1
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data: [...]

If I'm not missing, copyto_json.007.diff is the latest patch but it
needs to be rebased to the current HEAD. Here are random comments:

---
 if (opts_out->json_mode)
+   {
+       if (is_from)
+           ereport(ERROR,
+                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                    errmsg("cannot use JSON mode in COPY FROM")));
+   }
+   else if (opts_out->force_array)
+       ereport(ERROR,
+               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                errmsg("COPY FORCE_ARRAY requires JSON mode")));

I think that flatting these two condition make the code more readable:

if (opts_out->json_mode && is_from)
ereport(ERROR, ...);

if (!opts_out->json_mode && opts_out->force_array)
ereport(ERROR, ...);

Also these checks can be moved close to other checks at the end of
ProcessCopyOptions().

---
@@ -3395,6 +3395,10 @@ copy_opt_item:
                {
                    $$ = makeDefElem("format", (Node *) makeString("csv"), @1);
                }
+           | JSON
+               {
+                   $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+               }
            | HEADER_P
                {
                    $$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3427,6 +3431,10 @@ copy_opt_item:
                {
                    $$ = makeDefElem("encoding", (Node *) makeString($2), @1);
                }
+           | FORCE ARRAY
+               {
+                   $$ = makeDefElem("force_array", (Node *)
makeBoolean(true), @1);
+               }
        ;

I believe we don't need to support new options in old-style syntax.

---
@@ -3469,6 +3477,10 @@ copy_generic_opt_elem:
                {
                    $$ = makeDefElem($1, $2, @1);
                }
+           | FORMAT_LA copy_generic_opt_arg
+               {
+                   $$ = makeDefElem("format", $2, @1);
+               }
        ;

I think it's not necessary. "format" option is already handled in
copy_generic_opt_elem.

---
+/* need delimiter to start next json array element */
+static bool json_row_delim_needed = false;

I think it's cleaner to include json_row_delim_needed into CopyToStateData.

---
Splitting the patch into two patches: add json format and add
force_array option would make reviews easy.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#114

[1]: /messages/by-id/20240124.144936.67229716500876806.kou@clear-code.com
[2]: https://github.com/zhjwpku/postgres/pull/2/files

jian.universality@gmail.com

almost 2 years ago

In reply to: Masahiko Sawada (#113)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Fri, Jan 19, 2024 at 4:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

If I'm not missing, copyto_json.007.diff is the latest patch but it
needs to be rebased to the current HEAD. Here are random comments:

please check the latest version.

if (opts_out->json_mode)
+   {
+       if (is_from)
+           ereport(ERROR,
+                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                    errmsg("cannot use JSON mode in COPY FROM")));
+   }
+   else if (opts_out->force_array)
+       ereport(ERROR,
+               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                errmsg("COPY FORCE_ARRAY requires JSON mode")));

I think that flatting these two condition make the code more readable:

I make it two condition check

if (opts_out->json_mode && is_from)
ereport(ERROR, ...);

if (!opts_out->json_mode && opts_out->force_array)
ereport(ERROR, ...);

Also these checks can be moved close to other checks at the end of
ProcessCopyOptions().

Yes. I did it, please check it.

@@ -3395,6 +3395,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+           | JSON
+               {
+                   $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+               }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3427,6 +3431,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+           | FORCE ARRAY
+               {
+                   $$ = makeDefElem("force_array", (Node *)
makeBoolean(true), @1);
+               }
;

I believe we don't need to support new options in old-style syntax.

---
@@ -3469,6 +3477,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+           | FORMAT_LA copy_generic_opt_arg
+               {
+                   $$ = makeDefElem("format", $2, @1);
+               }
;

I think it's not necessary. "format" option is already handled in
copy_generic_opt_elem.

test it, I found out this part is necessary.
because a query with WITH like `copy (select 1) to stdout with
(format json, force_array false); ` will fail.

---
+/* need delimiter to start next json array element */
+static bool json_row_delim_needed = false;
I think it's cleaner to include json_row_delim_needed into CopyToStateData.

yes. I agree. So I did it.

---
Splitting the patch into two patches: add json format and add
force_array option would make reviews easy.

done. one patch for json format, another one for force_array option.

I also made the following cases fail.
copy copytest to stdout (format csv, force_array false);
ERROR: specify COPY FORCE_ARRAY is only allowed in JSON mode.

If copy to table then call table_scan_getnextslot no need to worry
about the Tupdesc.
however if we copy a query output as format json, we may need to consider it.

cstate->queryDesc->tupDesc is the output of Tupdesc, we can rely on this.
for copy a query result to json, I memcpy( cstate->queryDesc->tupDesc)
to the the slot's slot->tts_tupleDescriptor
so composite_to_json can use cstate->queryDesc->tupDesc to do the work.
I guess this will make it more bullet-proof.

Attachments:

v8-0002-Add-force_array-for-COPY-TO-json-fomrat.patchtext/x-patch; charset=US-ASCII; name=v8-0002-Add-force_array-for-COPY-TO-json-fomrat.patchDownload

From 214ad534d13730cba13008798c3d70f8b363436f Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 23 Jan 2024 12:26:43 +0800
Subject: [PATCH v8 2/2] Add force_array for COPY TO json fomrat.

make add open brackets and close for the whole output.
separate each json row with comma after the first row.
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++++
 src/backend/commands/copy.c        | 17 +++++++++++++++++
 src/backend/commands/copyto.c      | 30 ++++++++++++++++++++++++++++++
 src/backend/parser/gram.y          |  4 ++++
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 24 ++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 10 ++++++++++
 7 files changed, 100 insertions(+)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index ccd90b61..d19332ac 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
 </synopsis>
@@ -379,6 +380,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5d5b733d..e15056e1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -456,6 +456,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		freeze_specified = false;
 	bool		header_specified = false;
 	bool		on_error_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -610,6 +611,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -806,6 +814,15 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot use JSON mode in COPY FROM")));
 
+	if (!opts_out->json_mode && opts_out->force_array)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+	if (!opts_out->json_mode && force_array_specified)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("specify COPY FORCE_ARRAY is only allowed in JSON mode")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4f55d6d5..d9245df0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -88,6 +88,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -858,6 +859,16 @@ DoCopyTo(CopyToState cstate)
 
 			CopySendEndOfRow(cstate);
 		}
+
+		/*
+		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+		 * the opening bracket.
+		*/
+		if (cstate->opts.json_mode && cstate->opts.force_array)
+		{
+			CopySendChar(cstate, '[');
+			CopySendEndOfRow(cstate);
+		}
 	}
 
 	if (cstate->rel)
@@ -905,6 +916,15 @@ DoCopyTo(CopyToState cstate)
 		CopySendEndOfRow(cstate);
 	}
 
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * closing bracket.
+	*/
+	if (cstate->opts.json_mode && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendEndOfRow(cstate);
+	}
 	MemoryContextDelete(cstate->rowcontext);
 
 	if (fe_copy)
@@ -1006,6 +1026,16 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		result = makeStringInfo();
 		composite_to_json(rowdata, result, false);
 
+	if (cstate->json_row_delim_needed && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ',');
+	}
+	else if (cstate->opts.force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		cstate->json_row_delim_needed = true;
+	}
 		CopySendData(cstate, result->data, result->len);
 	}
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 702f04c3..4e13a0ab 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3468,6 +3468,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
 				}
+			| FORCE ARRAY
+				{
+					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f591b613..51656eec 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -72,6 +72,7 @@ typedef struct CopyFormatOptions
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		force_array;	/* add JSON array decorations */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	List	   *convert_select; /* list of column names (can be NIL) */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0c5ade47..1b200b0d 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -59,6 +59,30 @@ ERROR:  cannot specify HEADER in JSON mode
 -- Error
 copy copytest from stdout (format json);
 ERROR:  cannot use JSON mode in COPY FROM
+--Error
+copy copytest to stdout (format csv, force_array false);
+ERROR:  specify COPY FORCE_ARRAY is only allowed in JSON mode
+copy copytest from stdin (format json, force_array true);
+ERROR:  cannot use JSON mode in COPY FROM
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 96e4f0b6..a07d27af 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -64,6 +64,16 @@ copy copytest to stdout (format json, header);
 -- Error
 copy copytest from stdout (format json);
 
+--Error
+copy copytest to stdout (format csv, force_array false);
+copy copytest from stdin (format json, force_array true);
+
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
+
 
 -- embedded escaped characters
 create temp table copyjsontest (
-- 
2.34.1

v8-0001-Add-another-COPY-fomrat-json.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Add-another-COPY-fomrat-json.patchDownload

From 0cd43bfbaeacecaffcd8167d1aab0115aa229847 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 22 Jan 2024 22:58:37 +0800
Subject: [PATCH v8 1/2] Add another COPY fomrat: json.

this format is only allowed in COPY TO operation.
---
 doc/src/sgml/ref/copy.sgml         |   5 ++
 src/backend/commands/copy.c        |  13 ++++
 src/backend/commands/copyto.c      | 121 +++++++++++++++++++----------
 src/backend/parser/gram.y          |   8 ++
 src/backend/utils/adt/json.c       |   5 +-
 src/include/commands/copy.h        |   1 +
 src/include/utils/json.h           |   2 +
 src/test/regress/expected/copy.out |  54 +++++++++++++
 src/test/regress/sql/copy.sql      |  39 ++++++++++
 9 files changed, 204 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a0..ccd90b61 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -207,9 +207,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6..5d5b733d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -480,6 +480,8 @@ ProcessCopyOptions(ParseState *pstate,
 				 /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
 				opts_out->csv_mode = true;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->json_mode = true;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->binary = true;
 			else
@@ -716,6 +718,11 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot specify HEADER in BINARY mode")));
 
+	if (opts_out->json_mode && opts_out->header_line)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot specify HEADER in JSON mode")));
+
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
 		ereport(ERROR,
@@ -793,6 +800,12 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY FREEZE cannot be used with COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->json_mode && is_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot use JSON mode in COPY FROM")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d3dc3fc8..4f55d6d5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -28,6 +28,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -37,6 +38,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
@@ -146,9 +148,20 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (!cstate->opts.json_mode)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON mode is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -906,11 +919,7 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-	bool		need_delim = false;
-	FmgrInfo   *out_functions = cstate->out_functions;
 	MemoryContext oldcontext;
-	ListCell   *cur;
-	char	   *string;
 
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
@@ -921,54 +930,84 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
 	}
 
-	/* Make sure the tuple is fully deconstructed */
-	slot_getallattrs(slot);
-
-	foreach(cur, cstate->attnumlist)
+	if (!cstate->opts.json_mode)
 	{
-		int			attnum = lfirst_int(cur);
-		Datum		value = slot->tts_values[attnum - 1];
-		bool		isnull = slot->tts_isnull[attnum - 1];
+		bool		need_delim = false;
+		FmgrInfo   *out_functions = cstate->out_functions;
+		ListCell   *cur;
+		char	   *string;
 
-		if (!cstate->opts.binary)
-		{
-			if (need_delim)
-				CopySendChar(cstate, cstate->opts.delim[0]);
-			need_delim = true;
-		}
+		/* Make sure the tuple is fully deconstructed */
+		slot_getallattrs(slot);
 
-		if (isnull)
-		{
-			if (!cstate->opts.binary)
-				CopySendString(cstate, cstate->opts.null_print_client);
-			else
-				CopySendInt32(cstate, -1);
-		}
-		else
+		foreach(cur, cstate->attnumlist)
 		{
+			int			attnum = lfirst_int(cur);
+			Datum		value = slot->tts_values[attnum - 1];
+			bool		isnull = slot->tts_isnull[attnum - 1];
+
 			if (!cstate->opts.binary)
 			{
-				string = OutputFunctionCall(&out_functions[attnum - 1],
-											value);
-				if (cstate->opts.csv_mode)
-					CopyAttributeOutCSV(cstate, string,
-										cstate->opts.force_quote_flags[attnum - 1],
-										list_length(cstate->attnumlist) == 1);
+				if (need_delim)
+					CopySendChar(cstate, cstate->opts.delim[0]);
+				need_delim = true;
+			}
+
+			if (isnull)
+			{
+				if (!cstate->opts.binary)
+					CopySendString(cstate, cstate->opts.null_print_client);
 				else
-					CopyAttributeOutText(cstate, string);
+					CopySendInt32(cstate, -1);
 			}
 			else
 			{
-				bytea	   *outputbytes;
+				if (!cstate->opts.binary)
+				{
+					string = OutputFunctionCall(&out_functions[attnum - 1],
+												value);
+					if (cstate->opts.csv_mode)
+						CopyAttributeOutCSV(cstate, string,
+											cstate->opts.force_quote_flags[attnum - 1],
+											list_length(cstate->attnumlist) == 1);
+					else
+						CopyAttributeOutText(cstate, string);
+				}
+				else
+				{
+					bytea	   *outputbytes;
 
-				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-											   value);
-				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-				CopySendData(cstate, VARDATA(outputbytes),
-							 VARSIZE(outputbytes) - VARHDRSZ);
+					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+												   value);
+					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+					CopySendData(cstate, VARDATA(outputbytes),
+								 VARSIZE(outputbytes) - VARHDRSZ);
+				}
 			}
 		}
 	}
+	else
+	{
+		Datum		rowdata;
+		StringInfo	result;
+
+		if(!cstate->rel)
+		{
+			for (int i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+			{
+				/* Flat-copy the attribute array */
+				memcpy(TupleDescAttr(slot->tts_tupleDescriptor, i),
+				TupleDescAttr(cstate->queryDesc->tupDesc, i),
+								1 * sizeof(FormData_pg_attribute));
+			}
+			BlessTupleDesc(slot->tts_tupleDescriptor);
+		}
+		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+		result = makeStringInfo();
+		composite_to_json(rowdata, result, false);
+
+		CopySendData(cstate, result->data, result->len);
+	}
 
 	CopySendEndOfRow(cstate);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3460fea5..702f04c3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3424,6 +3424,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3506,6 +3510,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+			{
+				$$ = makeDefElem("format", $2, @1);
+			}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d719a61f..fabd4e61 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -83,8 +83,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -507,8 +505,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0..f591b613 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -53,6 +53,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
+	bool		json_mode;		/* JSON format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 6d7f1b38..d5631171 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 								const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365ec..0c5ade47 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -42,6 +42,60 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- Error
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+-- Error
+copy copytest from stdout (format json);
+ERROR:  cannot use JSON mode in COPY FROM
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e906..96e4f0b6 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -54,6 +54,45 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- Error
+copy copytest to stdout (format json, header);
+-- Error
+copy copytest from stdout (format json);
+
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
-- 
2.34.1

#115

Junwang Zhao

zhjwpku@gmail.com

almost 2 years ago

In reply to: jian he (#114)

10 attachment(s)

Re: Emitting JSON to file using COPY TO

Hi hackers,

Kou-san(CCed) has been working on *Make COPY format extendable[1]/messages/by-id/20240124.144936.67229716500876806.kou@clear-code.com*, so
I think making *copy to json* based on that work might be the right direction.

I write an extension for that purpose, and here is the patch set together
with Kou-san's *extendable copy format* implementation:

0001-0009 is the implementation of extendable copy format
00010 is the pg_copy_json extension

I also created a PR[2]https://github.com/zhjwpku/postgres/pull/2/files if anybody likes the github review style.

The *extendable copy format* feature is still being developed, I post this
email in case the patch set in this thread is committed without knowing
the *extendable copy format* feature.

I'd like to hear your opinions.

--
Regards
Junwang Zhao

Attachments:

v8-0004-Add-support-for-implementing-custom-COPY-TO-forma.patchapplication/octet-stream; name=v8-0004-Add-support-for-implementing-custom-COPY-TO-forma.patchDownload

From 4b177469f3fb8f14f0cd6bff3c7878dcafd9b760 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 15:12:43 +0900
Subject: [PATCH v8 04/10] Add support for implementing custom COPY TO format
 as extension

* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
* Rename CopySendEndOfRow() to CopyToStateFlush() because it's a
  method for CopyToState and it's used for flushing. End-of-row related
  codes were moved to CopyToTextSendEndOfRow().
---
 src/backend/commands/copyto.c  | 15 +++++++--------
 src/include/commands/copyapi.h |  5 +++++
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cfc74ee7b1..b5d8678394 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -69,7 +69,6 @@ static void SendCopyEnd(CopyToState cstate);
 static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
-static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
@@ -117,7 +116,7 @@ CopyToTextSendEndOfRow(CopyToState cstate)
 		default:
 			break;
 	}
-	CopySendEndOfRow(cstate);
+	CopyToStateFlush(cstate);
 }
 
 static void
@@ -302,7 +301,7 @@ CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
 		}
 	}
 
-	CopySendEndOfRow(cstate);
+	CopyToStateFlush(cstate);
 }
 
 static void
@@ -311,7 +310,7 @@ CopyToBinaryEnd(CopyToState cstate)
 	/* Generate trailer for a binary copy */
 	CopySendInt16(cstate, -1);
 	/* Need to flush out the trailer */
-	CopySendEndOfRow(cstate);
+	CopyToStateFlush(cstate);
 }
 
 CopyToRoutine CopyToRoutineText = {
@@ -377,8 +376,8 @@ SendCopyEnd(CopyToState cstate)
  * CopySendData sends output data to the destination (file or frontend)
  * CopySendString does the same for null-terminated strings
  * CopySendChar does the same for single characters
- * CopySendEndOfRow does the appropriate thing at end of each data row
- *	(data is not actually flushed except by CopySendEndOfRow)
+ * CopyToStateFlush flushes the buffered data
+ *	(data is not actually flushed except by CopyToStateFlush)
  *
  * NB: no data conversion is applied by these functions
  *----------
@@ -401,8 +400,8 @@ CopySendChar(CopyToState cstate, char c)
 	appendStringInfoCharMacro(cstate->fe_msgbuf, c);
 }
 
-static void
-CopySendEndOfRow(CopyToState cstate)
+void
+CopyToStateFlush(CopyToState cstate)
 {
 	StringInfo	fe_msgbuf = cstate->fe_msgbuf;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index a869d78d72..ffad433a21 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -174,6 +174,11 @@ typedef struct CopyToStateData
 	FmgrInfo   *out_functions;	/* lookup info for output functions */
 	MemoryContext rowcontext;	/* per-row evaluation context */
 	uint64		bytes_processed;	/* number of bytes processed so far */
+
+	/* For custom format implementation */
+	void	   *opaque;			/* private space */
 } CopyToStateData;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 #endif							/* COPYAPI_H */
-- 
2.41.0

v8-0003-Export-CopyToStateData.patchapplication/octet-stream; name=v8-0003-Export-CopyToStateData.patchDownload

From c3a59753b1157dc8e47e719263f58677acc33178 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 14:54:10 +0900
Subject: [PATCH v8 03/10] Export CopyToStateData

It's for custom COPY TO format handlers implemented as extension.

This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest enum values such as COPY_FILE are conflicted
CopySource enum values defined in copyfrom_internal.h. So COPY_DEST_
prefix instead of COPY_ prefix is used. For example, COPY_FILE is
renamed to COPY_DEST_FILE.

Note that this change isn't enough to implement a custom COPY TO
format handler as extension. We'll do the followings in a subsequent
commit:

1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c  |  74 +++-----------------
 src/include/commands/copy.h    |  59 ----------------
 src/include/commands/copyapi.h | 120 ++++++++++++++++++++++++++++++++-
 3 files changed, 127 insertions(+), 126 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 6547b7c654..cfc74ee7b1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -43,64 +43,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-	COPY_FILE,					/* to file (or a piped program) */
-	COPY_FRONTEND,				/* to frontend */
-	COPY_CALLBACK,				/* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-	/* low-level state data */
-	CopyDest	copy_dest;		/* type of copy source/destination */
-	FILE	   *copy_file;		/* used if copy_dest == COPY_FILE */
-	StringInfo	fe_msgbuf;		/* used for all dests during COPY TO */
-
-	int			file_encoding;	/* file or remote side's character encoding */
-	bool		need_transcoding;	/* file encoding diff from server? */
-	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
-
-	/* parameters from the COPY command */
-	Relation	rel;			/* relation to copy to */
-	QueryDesc  *queryDesc;		/* executable query to copy from */
-	List	   *attnumlist;		/* integer list of attnums to copy */
-	char	   *filename;		/* filename, or NULL for STDOUT */
-	bool		is_program;		/* is 'filename' a program to popen? */
-	copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-	CopyFormatOptions opts;
-	Node	   *whereClause;	/* WHERE condition (or NULL) */
-
-	/*
-	 * Working state
-	 */
-	MemoryContext copycontext;	/* per-copy execution context */
-
-	FmgrInfo   *out_functions;	/* lookup info for output functions */
-	MemoryContext rowcontext;	/* per-row evaluation context */
-	uint64		bytes_processed;	/* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -160,7 +102,7 @@ CopyToTextSendEndOfRow(CopyToState cstate)
 {
 	switch (cstate->copy_dest)
 	{
-		case COPY_FILE:
+		case COPY_DEST_FILE:
 			/* Default line termination depends on platform */
 #ifndef WIN32
 			CopySendChar(cstate, '\n');
@@ -168,7 +110,7 @@ CopyToTextSendEndOfRow(CopyToState cstate)
 			CopySendString(cstate, "\r\n");
 #endif
 			break;
-		case COPY_FRONTEND:
+		case COPY_DEST_FRONTEND:
 			/* The FE/BE protocol uses \n as newline for all platforms */
 			CopySendChar(cstate, '\n');
 			break;
@@ -419,7 +361,7 @@ SendCopyBegin(CopyToState cstate)
 	for (i = 0; i < natts; i++)
 		pq_sendint16(&buf, format); /* per-column formats */
 	pq_endmessage(&buf);
-	cstate->copy_dest = COPY_FRONTEND;
+	cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -466,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate)
 
 	switch (cstate->copy_dest)
 	{
-		case COPY_FILE:
+		case COPY_DEST_FILE:
 			if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
 					   cstate->copy_file) != 1 ||
 				ferror(cstate->copy_file))
@@ -500,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate)
 							 errmsg("could not write to COPY file: %m")));
 			}
 			break;
-		case COPY_FRONTEND:
+		case COPY_DEST_FRONTEND:
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
 			break;
-		case COPY_CALLBACK:
+		case COPY_DEST_CALLBACK:
 			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
@@ -877,12 +819,12 @@ BeginCopyTo(ParseState *pstate,
 	/* See Multibyte encoding comment above */
 	cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-	cstate->copy_dest = COPY_FILE;	/* default */
+	cstate->copy_dest = COPY_DEST_FILE; /* default */
 
 	if (data_dest_cb)
 	{
 		progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-		cstate->copy_dest = COPY_CALLBACK;
+		cstate->copy_dest = COPY_DEST_CALLBACK;
 		cstate->data_dest_cb = data_dest_cb;
 	}
 	else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 34bea880ca..b3f4682f95 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,69 +20,10 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
- */
-typedef enum CopyHeaderChoice
-{
-	COPY_HEADER_FALSE = 0,
-	COPY_HEADER_TRUE,
-	COPY_HEADER_MATCH,
-} CopyHeaderChoice;
-
-/*
- * Represents where to save input processing errors.  More values to be added
- * in the future.
- */
-typedef enum CopyOnErrorChoice
-{
-	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
-	COPY_ON_ERROR_IGNORE,		/* ignore errors */
-} CopyOnErrorChoice;
-
-/*
- * A struct to hold COPY options, in a parsed form. All of these are related
- * to formatting, except for 'freeze', which doesn't really belong here, but
- * it's expedient to parse it along with all the other options.
- */
-typedef struct CopyFormatOptions
-{
-	/* parameters from the COPY command */
-	int			file_encoding;	/* file or remote side's character encoding,
-								 * -1 if not specified */
-	bool		binary;			/* binary format? */
-	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
-	CopyHeaderChoice header_line;	/* header line? */
-	char	   *null_print;		/* NULL marker string (server encoding!) */
-	int			null_print_len; /* length of same */
-	char	   *null_print_client;	/* same converted to file encoding */
-	char	   *default_print;	/* DEFAULT marker string */
-	int			default_print_len;	/* length of same */
-	char	   *delim;			/* column delimiter (must be 1 byte) */
-	char	   *quote;			/* CSV quote char (must be 1 byte) */
-	char	   *escape;			/* CSV escape char (must be 1 byte) */
-	List	   *force_quote;	/* list of column names */
-	bool		force_quote_all;	/* FORCE_QUOTE *? */
-	bool	   *force_quote_flags;	/* per-column CSV FQ flags */
-	List	   *force_notnull;	/* list of column names */
-	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
-	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
-	List	   *force_null;		/* list of column names */
-	bool		force_null_all; /* FORCE_NULL *? */
-	bool	   *force_null_flags;	/* per-column CSV FN flags */
-	bool		convert_selectively;	/* do selective binary conversion? */
-	CopyOnErrorChoice on_error; /* what to do when error happened */
-	List	   *convert_select; /* list of column names (can be NIL) */
-	CopyToRoutine *to_routine;	/* callback routines for COPY TO */
-} CopyFormatOptions;
-
 /* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-typedef void (*copy_data_dest_cb) (void *data, int len);
 
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 9c25e1c415..a869d78d72 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,10 +14,10 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
+#include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/parsenodes.h"
 
-/* This is private in commands/copyto.c */
 typedef struct CopyToStateData *CopyToState;
 
 typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
@@ -58,4 +58,122 @@ extern CopyToRoutine CopyToRoutineText;
 extern CopyToRoutine CopyToRoutineCSV;
 extern CopyToRoutine CopyToRoutineBinary;
 
+/*
+ * Represents whether a header line should be present, and whether it must
+ * match the actual names (which implies "true").
+ */
+typedef enum CopyHeaderChoice
+{
+	COPY_HEADER_FALSE = 0,
+	COPY_HEADER_TRUE,
+	COPY_HEADER_MATCH,
+} CopyHeaderChoice;
+
+/*
+ * Represents where to save input processing errors.  More values to be added
+ * in the future.
+ */
+typedef enum CopyOnErrorChoice
+{
+	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
+	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+} CopyOnErrorChoice;
+
+/*
+ * A struct to hold COPY options, in a parsed form. All of these are related
+ * to formatting, except for 'freeze', which doesn't really belong here, but
+ * it's expedient to parse it along with all the other options.
+ */
+typedef struct CopyFormatOptions
+{
+	/* parameters from the COPY command */
+	int			file_encoding;	/* file or remote side's character encoding,
+								 * -1 if not specified */
+	bool		binary;			/* binary format? */
+	bool		freeze;			/* freeze rows on loading? */
+	bool		csv_mode;		/* Comma Separated Value format? */
+	CopyHeaderChoice header_line;	/* header line? */
+	char	   *null_print;		/* NULL marker string (server encoding!) */
+	int			null_print_len; /* length of same */
+	char	   *null_print_client;	/* same converted to file encoding */
+	char	   *default_print;	/* DEFAULT marker string */
+	int			default_print_len;	/* length of same */
+	char	   *delim;			/* column delimiter (must be 1 byte) */
+	char	   *quote;			/* CSV quote char (must be 1 byte) */
+	char	   *escape;			/* CSV escape char (must be 1 byte) */
+	List	   *force_quote;	/* list of column names */
+	bool		force_quote_all;	/* FORCE_QUOTE *? */
+	bool	   *force_quote_flags;	/* per-column CSV FQ flags */
+	List	   *force_notnull;	/* list of column names */
+	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
+	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	List	   *force_null;		/* list of column names */
+	bool		force_null_all; /* FORCE_NULL *? */
+	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		convert_selectively;	/* do selective binary conversion? */
+	CopyOnErrorChoice on_error; /* what to do when error happened */
+	List	   *convert_select; /* list of column names (can be NIL) */
+	CopyToRoutine *to_routine;	/* callback routines for COPY TO */
+} CopyFormatOptions;
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+	COPY_DEST_FILE,				/* to file (or a piped program) */
+	COPY_DEST_FRONTEND,			/* to frontend */
+	COPY_DEST_CALLBACK,			/* to callback function */
+} CopyDest;
+
+typedef void (*copy_data_dest_cb) (void *data, int len);
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+	/* low-level state data */
+	CopyDest	copy_dest;		/* type of copy source/destination */
+	FILE	   *copy_file;		/* used if copy_dest == COPY_FILE */
+	StringInfo	fe_msgbuf;		/* used for all dests during COPY TO */
+
+	int			file_encoding;	/* file or remote side's character encoding */
+	bool		need_transcoding;	/* file encoding diff from server? */
+	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
+
+	/* parameters from the COPY command */
+	Relation	rel;			/* relation to copy to */
+	QueryDesc  *queryDesc;		/* executable query to copy from */
+	List	   *attnumlist;		/* integer list of attnums to copy */
+	char	   *filename;		/* filename, or NULL for STDOUT */
+	bool		is_program;		/* is 'filename' a program to popen? */
+	copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+	CopyFormatOptions opts;
+	Node	   *whereClause;	/* WHERE condition (or NULL) */
+
+	/*
+	 * Working state
+	 */
+	MemoryContext copycontext;	/* per-copy execution context */
+
+	FmgrInfo   *out_functions;	/* lookup info for output functions */
+	MemoryContext rowcontext;	/* per-row evaluation context */
+	uint64		bytes_processed;	/* number of bytes processed so far */
+} CopyToStateData;
+
 #endif							/* COPYAPI_H */
-- 
2.41.0

v8-0001-Extract-COPY-TO-format-implementations.patchapplication/octet-stream; name=v8-0001-Extract-COPY-TO-format-implementations.patchDownload

From 6e68ba6380dc825a242e7f0d0a53442bba3a4a61 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 4 Dec 2023 12:32:54 +0900
Subject: [PATCH v8 01/10] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToRoutine can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

    CREATE TABLE data (int32 integer);
    INSERT INTO data
      SELECT random() * 10000
        FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

    format,elapsed time (ms)
    text,11.002
    csv,11.696
    binary,11.352

100K with this change:

    format,elapsed time (ms)
    text,100000,11.562
    csv,100000,11.889
    binary,100000,10.825

1M without this change:

    format,elapsed time (ms)
    text,108.359
    csv,114.233
    binary,111.251

1M with this change:

    format,elapsed time (ms)
    text,111.269
    csv,116.277
    binary,104.765

10M without this change:

    format,elapsed time (ms)
    text,1090.763
    csv,1136.103
    binary,1137.141

10M with this change:

    format,elapsed time (ms)
    text,1082.654
    csv,1196.991
    binary,1069.697
---
 contrib/file_fdw/file_fdw.c     |   2 +-
 src/backend/commands/copy.c     |  43 +++-
 src/backend/commands/copyfrom.c |   2 +-
 src/backend/commands/copyto.c   | 428 ++++++++++++++++++++------------
 src/include/commands/copy.h     |   7 +-
 src/include/commands/copyapi.h  |  59 +++++
 6 files changed, 376 insertions(+), 165 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a0..9e4e819858 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -329,7 +329,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	/*
 	 * Now apply the core COPY code's validation logic for more checks.
 	 */
-	ProcessCopyOptions(NULL, NULL, true, other_options);
+	ProcessCopyOptions(NULL, NULL, true, NULL, other_options);
 
 	/*
 	 * Either filename or program option is required for file_fdw foreign
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..5f3697a5f9 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -442,6 +442,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
  * a list of options.  In that usage, 'opts_out' can be passed as NULL and
  * the collected data is just leaked until CurrentMemoryContext is reset.
  *
+ * 'cstate' is CopyToState* for !is_from, CopyFromState* for is_from. 'cstate'
+ * may be NULL. For example, file_fdw uses NULL.
+ *
  * Note that additional checking, such as whether column names listed in FORCE
  * QUOTE actually exist, has to be applied later.  This just checks for
  * self-consistency of the options list.
@@ -450,6 +453,7 @@ void
 ProcessCopyOptions(ParseState *pstate,
 				   CopyFormatOptions *opts_out,
 				   bool is_from,
+				   void *cstate,
 				   List *options)
 {
 	bool		format_specified = false;
@@ -464,7 +468,13 @@ ProcessCopyOptions(ParseState *pstate,
 
 	opts_out->file_encoding = -1;
 
-	/* Extract options from the statement node tree */
+	/* Text is the default format. */
+	opts_out->to_routine = &CopyToRoutineText;
+
+	/*
+	 * Extract only the "format" option to detect target routine as the first
+	 * step
+	 */
 	foreach(option, options)
 	{
 		DefElem    *defel = lfirst_node(DefElem, option);
@@ -479,15 +489,29 @@ ProcessCopyOptions(ParseState *pstate,
 			if (strcmp(fmt, "text") == 0)
 				 /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
+			{
 				opts_out->csv_mode = true;
+				opts_out->to_routine = &CopyToRoutineCSV;
+			}
 			else if (strcmp(fmt, "binary") == 0)
+			{
 				opts_out->binary = true;
+				opts_out->to_routine = &CopyToRoutineBinary;
+			}
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("COPY format \"%s\" not recognized", fmt),
 						 parser_errposition(pstate, defel->location)));
 		}
+	}
+	/* Extract options except "format" from the statement node tree */
+	foreach(option, options)
+	{
+		DefElem    *defel = lfirst_node(DefElem, option);
+
+		if (strcmp(defel->defname, "format") == 0)
+			continue;
 		else if (strcmp(defel->defname, "freeze") == 0)
 		{
 			if (freeze_specified)
@@ -616,11 +640,18 @@ ProcessCopyOptions(ParseState *pstate,
 			opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from);
 		}
 		else
-			ereport(ERROR,
-					(errcode(ERRCODE_SYNTAX_ERROR),
-					 errmsg("option \"%s\" not recognized",
-							defel->defname),
-					 parser_errposition(pstate, defel->location)));
+		{
+			bool		processed = false;
+
+			if (!is_from)
+				processed = opts_out->to_routine->CopyToProcessOption(cstate, defel);
+			if (!processed)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("option \"%s\" not recognized",
+								defel->defname),
+						 parser_errposition(pstate, defel->location)));
+		}
 	}
 
 	/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..fb3d4d9296 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1416,7 +1416,7 @@ BeginCopyFrom(ParseState *pstate,
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Extract options from the statement node tree */
-	ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
+	ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , cstate, options);
 
 	/* Process the target relation */
 	cstate->rel = rel;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d3dc3fc854..6547b7c654 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,275 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToRoutine implementation for "text" and "csv". CopyToText*()
+ * refer cstate->opts.csv_mode and change their behavior. We can split this
+ * implementation and stop referring cstate->opts.csv_mode later.
+ */
+
+/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
+ * move the code to here later. */
+static bool
+CopyToTextProcessOption(CopyToState cstate, DefElem *defel)
+{
+	return false;
+}
+
+static int16
+CopyToTextGetFormat(CopyToState cstate)
+{
+	return 0;
+}
+
+static void
+CopyToTextSendEndOfRow(CopyToState cstate)
+{
+	switch (cstate->copy_dest)
+	{
+		case COPY_FILE:
+			/* Default line termination depends on platform */
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			break;
+		case COPY_FRONTEND:
+			/* The FE/BE protocol uses \n as newline for all platforms */
+			CopySendChar(cstate, '\n');
+			break;
+		default:
+			break;
+	}
+	CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToTextStart(CopyToState cstate, TupleDesc tupDesc)
+{
+	int			num_phys_attrs;
+	ListCell   *cur;
+
+	num_phys_attrs = tupDesc->natts;
+	/* Get info about the columns we need to process. */
+	cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		Oid			out_func_oid;
+		bool		isvarlena;
+		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+		getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+		fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+	}
+
+	/*
+	 * For non-binary copy, we need to convert null_print to file encoding,
+	 * because it will be sent directly with CopySendString.
+	 */
+	if (cstate->need_transcoding)
+		cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+														  cstate->opts.null_print_len,
+														  cstate->file_encoding);
+
+	/* if a header has been requested send the line */
+	if (cstate->opts.header_line)
+	{
+		bool		hdr_delim = false;
+
+		foreach(cur, cstate->attnumlist)
+		{
+			int			attnum = lfirst_int(cur);
+			char	   *colname;
+
+			if (hdr_delim)
+				CopySendChar(cstate, cstate->opts.delim[0]);
+			hdr_delim = true;
+
+			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+			if (cstate->opts.csv_mode)
+				CopyAttributeOutCSV(cstate, colname, false,
+									list_length(cstate->attnumlist) == 1);
+			else
+				CopyAttributeOutText(cstate, colname);
+		}
+
+		CopyToTextSendEndOfRow(cstate);
+	}
+}
+
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	bool		need_delim = false;
+	FmgrInfo   *out_functions = cstate->out_functions;
+	ListCell   *cur;
+
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		Datum		value = slot->tts_values[attnum - 1];
+		bool		isnull = slot->tts_isnull[attnum - 1];
+
+		if (need_delim)
+			CopySendChar(cstate, cstate->opts.delim[0]);
+		need_delim = true;
+
+		if (isnull)
+		{
+			CopySendString(cstate, cstate->opts.null_print_client);
+		}
+		else
+		{
+			char	   *string;
+
+			string = OutputFunctionCall(&out_functions[attnum - 1], value);
+			if (cstate->opts.csv_mode)
+				CopyAttributeOutCSV(cstate, string,
+									cstate->opts.force_quote_flags[attnum - 1],
+									list_length(cstate->attnumlist) == 1);
+			else
+				CopyAttributeOutText(cstate, string);
+		}
+	}
+
+	CopyToTextSendEndOfRow(cstate);
+}
+
+static void
+CopyToTextEnd(CopyToState cstate)
+{
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/* All "binary" options are parsed in ProcessCopyOptions(). We may move the
+ * code to here later. */
+static bool
+CopyToBinaryProcessOption(CopyToState cstate, DefElem *defel)
+{
+	return false;
+}
+
+static int16
+CopyToBinaryGetFormat(CopyToState cstate)
+{
+	return 1;
+}
+
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+	int			num_phys_attrs;
+	ListCell   *cur;
+
+	num_phys_attrs = tupDesc->natts;
+	/* Get info about the columns we need to process. */
+	cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		Oid			out_func_oid;
+		bool		isvarlena;
+		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+		getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+		fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+	}
+
+	{
+		/* Generate header for a binary copy */
+		int32		tmp;
+
+		/* Signature */
+		CopySendData(cstate, BinarySignature, 11);
+		/* Flags field */
+		tmp = 0;
+		CopySendInt32(cstate, tmp);
+		/* No header extension */
+		tmp = 0;
+		CopySendInt32(cstate, tmp);
+	}
+}
+
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	FmgrInfo   *out_functions = cstate->out_functions;
+	ListCell   *cur;
+
+	/* Binary per-tuple header */
+	CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		Datum		value = slot->tts_values[attnum - 1];
+		bool		isnull = slot->tts_isnull[attnum - 1];
+
+		if (isnull)
+		{
+			CopySendInt32(cstate, -1);
+		}
+		else
+		{
+			bytea	   *outputbytes;
+
+			outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
+			CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+			CopySendData(cstate, VARDATA(outputbytes),
+						 VARSIZE(outputbytes) - VARHDRSZ);
+		}
+	}
+
+	CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+	/* Generate trailer for a binary copy */
+	CopySendInt16(cstate, -1);
+	/* Need to flush out the trailer */
+	CopySendEndOfRow(cstate);
+}
+
+CopyToRoutine CopyToRoutineText = {
+	.CopyToProcessOption = CopyToTextProcessOption,
+	.CopyToGetFormat = CopyToTextGetFormat,
+	.CopyToStart = CopyToTextStart,
+	.CopyToOneRow = CopyToTextOneRow,
+	.CopyToEnd = CopyToTextEnd,
+};
+
+/*
+ * We can use the same CopyToRoutine for both of "text" and "csv" because
+ * CopyToText*() refer cstate->opts.csv_mode and change their behavior. We can
+ * split the implementations and stop referring cstate->opts.csv_mode later.
+ */
+CopyToRoutine CopyToRoutineCSV = {
+	.CopyToProcessOption = CopyToTextProcessOption,
+	.CopyToGetFormat = CopyToTextGetFormat,
+	.CopyToStart = CopyToTextStart,
+	.CopyToOneRow = CopyToTextOneRow,
+	.CopyToEnd = CopyToTextEnd,
+};
+
+CopyToRoutine CopyToRoutineBinary = {
+	.CopyToProcessOption = CopyToBinaryProcessOption,
+	.CopyToGetFormat = CopyToBinaryGetFormat,
+	.CopyToStart = CopyToBinaryStart,
+	.CopyToOneRow = CopyToBinaryOneRow,
+	.CopyToEnd = CopyToBinaryEnd,
+};
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -141,7 +410,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = cstate->opts.to_routine->CopyToGetFormat(cstate);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -198,16 +467,6 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
-			{
-				/* Default line termination depends on platform */
-#ifndef WIN32
-				CopySendChar(cstate, '\n');
-#else
-				CopySendString(cstate, "\r\n");
-#endif
-			}
-
 			if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
 					   cstate->copy_file) != 1 ||
 				ferror(cstate->copy_file))
@@ -242,10 +501,6 @@ CopySendEndOfRow(CopyToState cstate)
 			}
 			break;
 		case COPY_FRONTEND:
-			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
-				CopySendChar(cstate, '\n');
-
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
 			break;
@@ -431,7 +686,7 @@ BeginCopyTo(ParseState *pstate,
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Extract options from the statement node tree */
-	ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
+	ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , cstate, options);
 
 	/* Process the source/target relation or query */
 	if (rel)
@@ -748,8 +1003,6 @@ DoCopyTo(CopyToState cstate)
 	bool		pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	TupleDesc	tupDesc;
-	int			num_phys_attrs;
-	ListCell   *cur;
 	uint64		processed;
 
 	if (fe_copy)
@@ -759,32 +1012,11 @@ DoCopyTo(CopyToState cstate)
 		tupDesc = RelationGetDescr(cstate->rel);
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
-	num_phys_attrs = tupDesc->natts;
 	cstate->opts.null_print_client = cstate->opts.null_print;	/* default */
 
 	/* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
 	cstate->fe_msgbuf = makeStringInfo();
 
-	/* Get info about the columns we need to process. */
-	cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-	foreach(cur, cstate->attnumlist)
-	{
-		int			attnum = lfirst_int(cur);
-		Oid			out_func_oid;
-		bool		isvarlena;
-		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
-		if (cstate->opts.binary)
-			getTypeBinaryOutputInfo(attr->atttypid,
-									&out_func_oid,
-									&isvarlena);
-		else
-			getTypeOutputInfo(attr->atttypid,
-							  &out_func_oid,
-							  &isvarlena);
-		fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-	}
-
 	/*
 	 * Create a temporary memory context that we can reset once per row to
 	 * recover palloc'd memory.  This avoids any problems with leaks inside
@@ -795,57 +1027,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
-	{
-		/* Generate header for a binary copy */
-		int32		tmp;
-
-		/* Signature */
-		CopySendData(cstate, BinarySignature, 11);
-		/* Flags field */
-		tmp = 0;
-		CopySendInt32(cstate, tmp);
-		/* No header extension */
-		tmp = 0;
-		CopySendInt32(cstate, tmp);
-	}
-	else
-	{
-		/*
-		 * For non-binary copy, we need to convert null_print to file
-		 * encoding, because it will be sent directly with CopySendString.
-		 */
-		if (cstate->need_transcoding)
-			cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-															  cstate->opts.null_print_len,
-															  cstate->file_encoding);
-
-		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
-		{
-			bool		hdr_delim = false;
-
-			foreach(cur, cstate->attnumlist)
-			{
-				int			attnum = lfirst_int(cur);
-				char	   *colname;
-
-				if (hdr_delim)
-					CopySendChar(cstate, cstate->opts.delim[0]);
-				hdr_delim = true;
-
-				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-				if (cstate->opts.csv_mode)
-					CopyAttributeOutCSV(cstate, colname, false,
-										list_length(cstate->attnumlist) == 1);
-				else
-					CopyAttributeOutText(cstate, colname);
-			}
-
-			CopySendEndOfRow(cstate);
-		}
-	}
+	cstate->opts.to_routine->CopyToStart(cstate, tupDesc);
 
 	if (cstate->rel)
 	{
@@ -884,13 +1066,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
+	cstate->opts.to_routine->CopyToEnd(cstate);
 
 	MemoryContextDelete(cstate->rowcontext);
 
@@ -906,71 +1082,15 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-	bool		need_delim = false;
-	FmgrInfo   *out_functions = cstate->out_functions;
 	MemoryContext oldcontext;
-	ListCell   *cur;
-	char	   *string;
 
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
-	{
-		/* Binary per-tuple header */
-		CopySendInt16(cstate, list_length(cstate->attnumlist));
-	}
-
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	foreach(cur, cstate->attnumlist)
-	{
-		int			attnum = lfirst_int(cur);
-		Datum		value = slot->tts_values[attnum - 1];
-		bool		isnull = slot->tts_isnull[attnum - 1];
-
-		if (!cstate->opts.binary)
-		{
-			if (need_delim)
-				CopySendChar(cstate, cstate->opts.delim[0]);
-			need_delim = true;
-		}
-
-		if (isnull)
-		{
-			if (!cstate->opts.binary)
-				CopySendString(cstate, cstate->opts.null_print_client);
-			else
-				CopySendInt32(cstate, -1);
-		}
-		else
-		{
-			if (!cstate->opts.binary)
-			{
-				string = OutputFunctionCall(&out_functions[attnum - 1],
-											value);
-				if (cstate->opts.csv_mode)
-					CopyAttributeOutCSV(cstate, string,
-										cstate->opts.force_quote_flags[attnum - 1],
-										list_length(cstate->attnumlist) == 1);
-				else
-					CopyAttributeOutText(cstate, string);
-			}
-			else
-			{
-				bytea	   *outputbytes;
-
-				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-											   value);
-				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-				CopySendData(cstate, VARDATA(outputbytes),
-							 VARSIZE(outputbytes) - VARHDRSZ);
-			}
-		}
-	}
-
-	CopySendEndOfRow(cstate);
+	cstate->opts.to_routine->CopyToOneRow(cstate, slot);
 
 	MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0be..34bea880ca 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
 #ifndef COPY_H
 #define COPY_H
 
+#include "commands/copyapi.h"
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	List	   *convert_select; /* list of column names (can be NIL) */
+	CopyToRoutine *to_routine;	/* callback routines for COPY TO */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
@@ -87,7 +88,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
 				   uint64 *processed);
 
-extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options);
+extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, void *cstate, List *options);
 extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause,
 								   const char *filename,
 								   bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 0000000000..eb68f2fb7b
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/parsenodes.h"
+
+/* This is private in commands/copyto.c */
+typedef struct CopyToStateData *CopyToState;
+
+typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
+typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
+typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
+typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
+typedef void (*CopyToEnd_function) (CopyToState cstate);
+
+/* Routines for a COPY TO format implementation. */
+typedef struct CopyToRoutine
+{
+	/*
+	 * Called for processing one COPY TO option. This will return false when
+	 * the given option is invalid.
+	 */
+	CopyToProcessOption_function CopyToProcessOption;
+
+	/*
+	 * Called when COPY TO is started. This will return a format as int16
+	 * value. It's used for the CopyOutResponse message.
+	 */
+	CopyToGetFormat_function CopyToGetFormat;
+
+	/* Called when COPY TO is started. This will send a header. */
+	CopyToStart_function CopyToStart;
+
+	/* Copy one row for COPY TO. */
+	CopyToOneRow_function CopyToOneRow;
+
+	/* Called when COPY TO is ended. This will send a trailer. */
+	CopyToEnd_function CopyToEnd;
+}			CopyToRoutine;
+
+/* Built-in CopyToRoutine for "text", "csv" and "binary". */
+extern CopyToRoutine CopyToRoutineText;
+extern CopyToRoutine CopyToRoutineCSV;
+extern CopyToRoutine CopyToRoutineBinary;
+
+#endif							/* COPYAPI_H */
-- 
2.41.0

v8-0002-Add-support-for-adding-custom-COPY-TO-format.patchapplication/octet-stream; name=v8-0002-Add-support-for-adding-custom-COPY-TO-format.patchDownload

From a597f8a2beec12971d77419f08b5722f531774f3 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 13:58:38 +0900
Subject: [PATCH v8 02/10] Add support for adding custom COPY TO format

This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.

We will add support for custom COPY FROM format later. We'll use the
same handler for COPY TO and COPY FROM. PostgreSQL calls a COPY
TO/FROM handler with "is_from" argument. It's true for COPY FROM and
false for COPY TO:

    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine (not exist yet)

We discussed that we introduce a wrapper struct for it:

    typedef struct CopyRoutine
    {
        NodeTag type;
        /* either CopyToRoutine or CopyFromRoutine */
        Node *routine;
    }

    copy_handler(true) returns CopyRoutine with CopyToRoutine
    copy_handler(false) returns CopyRoutine with CopyFromRoutine

See also: https://www.postgresql.org/message-id/flat/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA%40mail.gmail.com

But I noticed that we don't need the wrapper struct. We can just
CopyToRoutine or CopyFromRoutine. Because we can distinct the returned
struct by checking its NodeTag. So I don't use the wrapper struct
approach.
---
 src/backend/commands/copy.c                   | 84 ++++++++++++++-----
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 +++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 ++++++++
 .../test_copy_format/sql/test_copy_format.sql |  8 ++
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 77 +++++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 18 files changed, 260 insertions(+), 19 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5f3697a5f9..6f0db0ae7c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteHandler.h"
 #include "utils/acl.h"
@@ -430,6 +431,69 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 	return COPY_ON_ERROR_STOP;	/* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine. If no
+ * COPY format handler is found, this function reports an error.
+ */
+static void
+ProcessCopyOptionCustomFormat(ParseState *pstate,
+							  CopyFormatOptions *opts_out,
+							  bool is_from,
+							  DefElem *defel)
+{
+	char	   *format;
+	Oid			funcargtypes[1];
+	Oid			handlerOid = InvalidOid;
+	Datum		datum;
+	void	   *routine;
+
+	format = defGetString(defel);
+
+	/* built-in formats */
+	if (strcmp(format, "text") == 0)
+		 /* default format */ return;
+	else if (strcmp(format, "csv") == 0)
+	{
+		opts_out->csv_mode = true;
+		opts_out->to_routine = &CopyToRoutineCSV;
+		return;
+	}
+	else if (strcmp(format, "binary") == 0)
+	{
+		opts_out->binary = true;
+		opts_out->to_routine = &CopyToRoutineBinary;
+		return;
+	}
+
+	/* custom format */
+	if (!is_from)
+	{
+		funcargtypes[0] = INTERNALOID;
+		handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+									funcargtypes, true);
+	}
+	if (!OidIsValid(handlerOid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY format \"%s\" not recognized", format),
+				 parser_errposition(pstate, defel->location)));
+
+	datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+	routine = DatumGetPointer(datum);
+	if (routine == NULL || !IsA(routine, CopyToRoutine))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY handler function %s(%u) did not return a CopyToRoutine struct",
+						format, handlerOid),
+				 parser_errposition(pstate, defel->location)));
+
+	opts_out->to_routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -481,28 +545,10 @@ ProcessCopyOptions(ParseState *pstate,
 
 		if (strcmp(defel->defname, "format") == 0)
 		{
-			char	   *fmt = defGetString(defel);
-
 			if (format_specified)
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
-			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
-			else if (strcmp(fmt, "csv") == 0)
-			{
-				opts_out->csv_mode = true;
-				opts_out->to_routine = &CopyToRoutineCSV;
-			}
-			else if (strcmp(fmt, "binary") == 0)
-			{
-				opts_out->binary = true;
-				opts_out->to_routine = &CopyToRoutineBinary;
-			}
-			else
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("COPY format \"%s\" not recognized", fmt),
-						 parser_errposition(pstate, defel->location)));
+			ProcessCopyOptionCustomFormat(pstate, opts_out, is_from, defel);
 		}
 	}
 	/* Extract options except "format" from the statement node tree */
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e..173ee11811 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
 	access/sdir.h \
 	access/tableam.h \
 	access/tsmapi.h \
+	commands/copyapi.h \
 	commands/event_trigger.h \
 	commands/trigger.h \
 	executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 2f0a59bc87..bd397f45ac
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index a3a991f634..d308780c43 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -373,6 +373,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..d4e426687c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7617,6 +7617,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index d29194da31..2040d5da83 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -632,6 +632,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to/from method functoin',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   typname => 'table_am_handler',
   descr => 'pseudo-type for the result of a table AM handler function',
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index eb68f2fb7b..9c25e1c415 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -29,6 +29,8 @@ typedef void (*CopyToEnd_function) (CopyToState cstate);
 /* Routines for a COPY TO format implementation. */
 typedef struct CopyToRoutine
 {
+	NodeTag		type;
+
 	/*
 	 * Called for processing one COPY TO option. This will return false when
 	 * the given option is invalid.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b65..103df1a787 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index e32c8925f6..9d57b868d5 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  spgist_name_ops \
 		  test_bloomfilter \
 		  test_copy_callbacks \
+		  test_copy_format \
 		  test_custom_rmgrs \
 		  test_ddl_deparse \
 		  test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 397e0906e6..d76f2a6003 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -13,6 +13,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 0000000000..8497f91624
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+	$(WIN32RES) \
+	test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 0000000000..3a24ae7b97
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a INT, b INT, c INT);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test TO stdout WITH (
+	option_before 'before',
+	format 'test_copy_format',
+	option_after 'after'
+);
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToProcessOption: "option_before"="before"
+NOTICE:  CopyToProcessOption: "option_after"="after"
+NOTICE:  CopyToGetFormat
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 0000000000..4cefe7b709
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 0000000000..0eb7ed2e11
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,8 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a INT, b INT, c INT);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test TO stdout WITH (
+	option_before 'before',
+	format 'test_copy_format',
+	option_after 'after'
+);
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 0000000000..d24ea03ce9
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+	RETURNS copy_handler
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 0000000000..a2219afcde
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,77 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *		Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copy.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static bool
+CopyToProcessOption(CopyToState cstate, DefElem *defel)
+{
+	ereport(NOTICE,
+			(errmsg("CopyToProcessOption: \"%s\"=\"%s\"",
+					defel->defname, defGetString(defel))));
+	return true;
+}
+
+static int16
+CopyToGetFormat(CopyToState cstate)
+{
+	ereport(NOTICE, (errmsg("CopyToGetFormat")));
+	return 0;
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+	ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+	ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+	.type = T_CopyToRoutine,
+	.CopyToProcessOption = CopyToProcessOption,
+	.CopyToGetFormat = CopyToGetFormat,
+	.CopyToStart = CopyToStart,
+	.CopyToOneRow = CopyToOneRow,
+	.CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+	bool		is_from = PG_GETARG_BOOL(0);
+
+	ereport(NOTICE,
+			(errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+	if (is_from)
+		elog(ERROR, "COPY FROM isn't supported yet");
+
+	PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 0000000000..f05a636235
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.41.0

v8-0005-Extract-COPY-FROM-format-implementations.patchapplication/octet-stream; name=v8-0005-Extract-COPY-FROM-format-implementations.patchDownload

From 781955f19ad27cdd66748be539bf45cf1b925856 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 17:21:23 +0900
Subject: [PATCH v8 05/10] Extract COPY FROM format implementations

This doesn't change the current behavior. This just introduces
CopyFromRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyFromRoutine can't be used from extensions yet because
CopyRead*() aren't exported yet. Extensions can't read data from a
source without CopyRead*(). They will be exported by subsequent
patches.
---
 src/backend/commands/copy.c              |   3 +
 src/backend/commands/copyfrom.c          | 216 +++++++++++----
 src/backend/commands/copyfromparse.c     | 326 ++++++++++++-----------
 src/include/commands/copy.h              |   3 -
 src/include/commands/copyapi.h           |  44 +++
 src/include/commands/copyfrom_internal.h |   4 +
 6 files changed, 391 insertions(+), 205 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6f0db0ae7c..ec6dfff8ab 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -459,12 +459,14 @@ ProcessCopyOptionCustomFormat(ParseState *pstate,
 	else if (strcmp(format, "csv") == 0)
 	{
 		opts_out->csv_mode = true;
+		opts_out->from_routine = &CopyFromRoutineCSV;
 		opts_out->to_routine = &CopyToRoutineCSV;
 		return;
 	}
 	else if (strcmp(format, "binary") == 0)
 	{
 		opts_out->binary = true;
+		opts_out->from_routine = &CopyFromRoutineBinary;
 		opts_out->to_routine = &CopyToRoutineBinary;
 		return;
 	}
@@ -533,6 +535,7 @@ ProcessCopyOptions(ParseState *pstate,
 	opts_out->file_encoding = -1;
 
 	/* Text is the default format. */
+	opts_out->from_routine = &CopyFromRoutineText;
 	opts_out->to_routine = &CopyToRoutineText;
 
 	/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fb3d4d9296..d556ebb5d6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -108,6 +108,170 @@ static char *limit_printout_length(const char *str);
 
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations.
+ */
+
+/*
+ * CopyFromRoutine implementation for "text" and "csv". CopyFromText*()
+ * refer cstate->opts.csv_mode and change their behavior. We can split this
+ * implementation and stop referring cstate->opts.csv_mode later.
+ */
+
+/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
+ * move the code to here later. */
+static bool
+CopyFromTextProcessOption(CopyFromState cstate, DefElem *defel)
+{
+	return false;
+}
+
+static int16
+CopyFromTextGetFormat(CopyFromState cstate)
+{
+	return 0;
+}
+
+static void
+CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+	AttrNumber	num_phys_attrs = tupDesc->natts;
+	AttrNumber	attr_count;
+
+	/*
+	 * If encoding conversion is needed, we need another buffer to hold the
+	 * converted input data.  Otherwise, we can just point input_buf to the
+	 * same buffer as raw_buf.
+	 */
+	if (cstate->need_transcoding)
+	{
+		cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+		cstate->input_buf_index = cstate->input_buf_len = 0;
+	}
+	else
+		cstate->input_buf = cstate->raw_buf;
+	cstate->input_reached_eof = false;
+
+	initStringInfo(&cstate->line_buf);
+
+	/*
+	 * Pick up the required catalog information for each attribute in the
+	 * relation, including the input function, the element type (to pass to
+	 * the input function).
+	 */
+	cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+	cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+	for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+	{
+		Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+		Oid			in_func_oid;
+
+		/* We don't need info for dropped attributes */
+		if (att->attisdropped)
+			continue;
+
+		/* Fetch the input function and typioparam info */
+		getTypeInputInfo(att->atttypid,
+						 &in_func_oid, &cstate->typioparams[attnum - 1]);
+		fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+	}
+
+	/* create workspace for CopyReadAttributes results */
+	attr_count = list_length(cstate->attnumlist);
+	cstate->max_fields = attr_count;
+	cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+static void
+CopyFromTextEnd(CopyFromState cstate)
+{
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/* All "binary" options are parsed in ProcessCopyOptions(). We may move the
+ * code to here later. */
+static bool
+CopyFromBinaryProcessOption(CopyFromState cstate, DefElem *defel)
+{
+	return false;
+}
+
+static int16
+CopyFromBinaryGetFormat(CopyFromState cstate)
+{
+	return 1;
+}
+
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+	AttrNumber	num_phys_attrs = tupDesc->natts;
+
+	/*
+	 * Pick up the required catalog information for each attribute in the
+	 * relation, including the input function, the element type (to pass to
+	 * the input function).
+	 */
+	cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+	cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+	for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+	{
+		Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+		Oid			in_func_oid;
+
+		/* We don't need info for dropped attributes */
+		if (att->attisdropped)
+			continue;
+
+		/* Fetch the input function and typioparam info */
+		getTypeBinaryInputInfo(att->atttypid,
+							   &in_func_oid, &cstate->typioparams[attnum - 1]);
+		fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+	}
+
+	/* Read and verify binary header */
+	ReceiveCopyBinaryHeader(cstate);
+}
+
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+}
+
+CopyFromRoutine CopyFromRoutineText = {
+	.CopyFromProcessOption = CopyFromTextProcessOption,
+	.CopyFromGetFormat = CopyFromTextGetFormat,
+	.CopyFromStart = CopyFromTextStart,
+	.CopyFromOneRow = CopyFromTextOneRow,
+	.CopyFromEnd = CopyFromTextEnd,
+};
+
+/*
+ * We can use the same CopyFromRoutine for both of "text" and "csv" because
+ * CopyFromText*() refer cstate->opts.csv_mode and change their behavior. We can
+ * split the implementations and stop referring cstate->opts.csv_mode later.
+ */
+CopyFromRoutine CopyFromRoutineCSV = {
+	.CopyFromProcessOption = CopyFromTextProcessOption,
+	.CopyFromGetFormat = CopyFromTextGetFormat,
+	.CopyFromStart = CopyFromTextStart,
+	.CopyFromOneRow = CopyFromTextOneRow,
+	.CopyFromEnd = CopyFromTextEnd,
+};
+
+CopyFromRoutine CopyFromRoutineBinary = {
+	.CopyFromProcessOption = CopyFromBinaryProcessOption,
+	.CopyFromGetFormat = CopyFromBinaryGetFormat,
+	.CopyFromStart = CopyFromBinaryStart,
+	.CopyFromOneRow = CopyFromBinaryOneRow,
+	.CopyFromEnd = CopyFromBinaryEnd,
+};
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1384,9 +1548,6 @@ BeginCopyFrom(ParseState *pstate,
 	TupleDesc	tupDesc;
 	AttrNumber	num_phys_attrs,
 				num_defaults;
-	FmgrInfo   *in_functions;
-	Oid		   *typioparams;
-	Oid			in_func_oid;
 	int		   *defmap;
 	ExprState **defexprs;
 	MemoryContext oldcontext;
@@ -1571,25 +1732,6 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
-	{
-		/*
-		 * If encoding conversion is needed, we need another buffer to hold
-		 * the converted input data.  Otherwise, we can just point input_buf
-		 * to the same buffer as raw_buf.
-		 */
-		if (cstate->need_transcoding)
-		{
-			cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-			cstate->input_buf_index = cstate->input_buf_len = 0;
-		}
-		else
-			cstate->input_buf = cstate->raw_buf;
-		cstate->input_reached_eof = false;
-
-		initStringInfo(&cstate->line_buf);
-	}
-
 	initStringInfo(&cstate->attribute_buf);
 
 	/* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1608,8 +1750,6 @@ BeginCopyFrom(ParseState *pstate,
 	 * the input function), and info about defaults and constraints. (Which
 	 * input function we use depends on text/binary format choice.)
 	 */
-	in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-	typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
 	defmap = (int *) palloc(num_phys_attrs * sizeof(int));
 	defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));
 
@@ -1621,15 +1761,6 @@ BeginCopyFrom(ParseState *pstate,
 		if (att->attisdropped)
 			continue;
 
-		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
-			getTypeBinaryInputInfo(att->atttypid,
-								   &in_func_oid, &typioparams[attnum - 1]);
-		else
-			getTypeInputInfo(att->atttypid,
-							 &in_func_oid, &typioparams[attnum - 1]);
-		fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-
 		/* Get default info if available */
 		defexprs[attnum - 1] = NULL;
 
@@ -1689,8 +1820,6 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->bytes_processed = 0;
 
 	/* We keep those variables in cstate. */
-	cstate->in_functions = in_functions;
-	cstate->typioparams = typioparams;
 	cstate->defmap = defmap;
 	cstate->defexprs = defexprs;
 	cstate->volatile_defexprs = volatile_defexprs;
@@ -1763,20 +1892,7 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
-	{
-		/* Read and verify binary header */
-		ReceiveCopyBinaryHeader(cstate);
-	}
-
-	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
-	{
-		AttrNumber	attr_count = list_length(cstate->attnumlist);
-
-		cstate->max_fields = attr_count;
-		cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-	}
+	cstate->opts.from_routine->CopyFromStart(cstate, tupDesc);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1789,6 +1905,8 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+	cstate->opts.from_routine->CopyFromEnd(cstate);
+
 	/* No COPY FROM related resources except memory. */
 	if (cstate->is_program)
 	{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..49632f75e4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -172,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = cstate->opts.from_routine->CopyFromGetFormat(cstate);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -840,199 +840,219 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	return true;
 }
 
-/*
- * Read next tuple from file for COPY FROM. Return false if no more tuples.
- *
- * 'econtext' is used to evaluate default expression for each column that is
- * either not read from the file or is using the DEFAULT option of COPY FROM.
- * It can be NULL when no default values are used, i.e. when all columns are
- * read from the file, and DEFAULT option is unset.
- *
- * 'values' and 'nulls' arrays must be the same length as columns of the
- * relation passed to BeginCopyFrom. This function fills the arrays.
- */
 bool
-NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
-			 Datum *values, bool *nulls)
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
 {
 	TupleDesc	tupDesc;
-	AttrNumber	num_phys_attrs,
-				attr_count,
-				num_defaults = cstate->num_defaults;
+	AttrNumber	attr_count;
 	FmgrInfo   *in_functions = cstate->in_functions;
 	Oid		   *typioparams = cstate->typioparams;
-	int			i;
-	int		   *defmap = cstate->defmap;
 	ExprState **defexprs = cstate->defexprs;
+	char	  **field_strings;
+	ListCell   *cur;
+	int			fldct;
+	int			fieldno;
+	char	   *string;
 
 	tupDesc = RelationGetDescr(cstate->rel);
-	num_phys_attrs = tupDesc->natts;
 	attr_count = list_length(cstate->attnumlist);
 
-	/* Initialize all values for row to NULL */
-	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
-	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
+	/* read raw fields in the next line */
+	if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+		return false;
 
-	if (!cstate->opts.binary)
-	{
-		char	  **field_strings;
-		ListCell   *cur;
-		int			fldct;
-		int			fieldno;
-		char	   *string;
+	/* check for overflowing fields */
+	if (attr_count > 0 && fldct > attr_count)
+		ereport(ERROR,
+				(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+				 errmsg("extra data after last expected column")));
 
-		/* read raw fields in the next line */
-		if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-			return false;
+	fieldno = 0;
 
-		/* check for overflowing fields */
-		if (attr_count > 0 && fldct > attr_count)
+	/* Loop to read the user attributes on the line. */
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		int			m = attnum - 1;
+		Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+		if (fieldno >= fldct)
 			ereport(ERROR,
 					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+					 errmsg("missing data for column \"%s\"",
+							NameStr(att->attname))));
+		string = field_strings[fieldno++];
 
-		fieldno = 0;
-
-		/* Loop to read the user attributes on the line. */
-		foreach(cur, cstate->attnumlist)
+		if (cstate->convert_select_flags &&
+			!cstate->convert_select_flags[m])
 		{
-			int			attnum = lfirst_int(cur);
-			int			m = attnum - 1;
-			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
-			string = field_strings[fieldno++];
-
-			if (cstate->convert_select_flags &&
-				!cstate->convert_select_flags[m])
-			{
-				/* ignore input field, leaving column as NULL */
-				continue;
-			}
+			/* ignore input field, leaving column as NULL */
+			continue;
+		}
 
-			if (cstate->opts.csv_mode)
+		if (cstate->opts.csv_mode)
+		{
+			if (string == NULL &&
+				cstate->opts.force_notnull_flags[m])
 			{
-				if (string == NULL &&
-					cstate->opts.force_notnull_flags[m])
-				{
-					/*
-					 * FORCE_NOT_NULL option is set and column is NULL -
-					 * convert it to the NULL string.
-					 */
-					string = cstate->opts.null_print;
-				}
-				else if (string != NULL && cstate->opts.force_null_flags[m]
-						 && strcmp(string, cstate->opts.null_print) == 0)
-				{
-					/*
-					 * FORCE_NULL option is set and column matches the NULL
-					 * string. It must have been quoted, or otherwise the
-					 * string would already have been set to NULL. Convert it
-					 * to NULL as specified.
-					 */
-					string = NULL;
-				}
+				/*
+				 * FORCE_NOT_NULL option is set and column is NULL - convert
+				 * it to the NULL string.
+				 */
+				string = cstate->opts.null_print;
 			}
-
-			cstate->cur_attname = NameStr(att->attname);
-			cstate->cur_attval = string;
-
-			if (string != NULL)
-				nulls[m] = false;
-
-			if (cstate->defaults[m])
+			else if (string != NULL && cstate->opts.force_null_flags[m]
+					 && strcmp(string, cstate->opts.null_print) == 0)
 			{
 				/*
-				 * The caller must supply econtext and have switched into the
-				 * per-tuple memory context in it.
+				 * FORCE_NULL option is set and column matches the NULL
+				 * string. It must have been quoted, or otherwise the string
+				 * would already have been set to NULL. Convert it to NULL as
+				 * specified.
 				 */
-				Assert(econtext != NULL);
-				Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-				values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+				string = NULL;
 			}
+		}
+
+		cstate->cur_attname = NameStr(att->attname);
+		cstate->cur_attval = string;
+
+		if (string != NULL)
+			nulls[m] = false;
 
+		if (cstate->defaults[m])
+		{
 			/*
-			 * If ON_ERROR is specified with IGNORE, skip rows with soft
-			 * errors
+			 * The caller must supply econtext and have switched into the
+			 * per-tuple memory context in it.
 			 */
-			else if (!InputFunctionCallSafe(&in_functions[m],
-											string,
-											typioparams[m],
-											att->atttypmod,
-											(Node *) cstate->escontext,
-											&values[m]))
-			{
-				cstate->num_errors++;
-				return true;
-			}
+			Assert(econtext != NULL);
+			Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
 
-			cstate->cur_attname = NULL;
-			cstate->cur_attval = NULL;
+			values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
 		}
 
-		Assert(fieldno == attr_count);
+		/*
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 */
+		else if (!InputFunctionCallSafe(&in_functions[m],
+										string,
+										typioparams[m],
+										att->atttypmod,
+										(Node *) cstate->escontext,
+										&values[m]))
+		{
+			cstate->num_errors++;
+			return true;
+		}
+
+		cstate->cur_attname = NULL;
+		cstate->cur_attval = NULL;
 	}
-	else
-	{
-		/* binary */
-		int16		fld_count;
-		ListCell   *cur;
 
-		cstate->cur_lineno++;
+	Assert(fieldno == attr_count);
 
-		if (!CopyGetInt16(cstate, &fld_count))
-		{
-			/* EOF detected (end of file, or protocol-level EOF) */
-			return false;
-		}
+	return true;
+}
 
-		if (fld_count == -1)
-		{
-			/*
-			 * Received EOF marker.  Wait for the protocol-level EOF, and
-			 * complain if it doesn't come immediately.  In COPY FROM STDIN,
-			 * this ensures that we correctly handle CopyFail, if client
-			 * chooses to send that now.  When copying from file, we could
-			 * ignore the rest of the file like in text mode, but we choose to
-			 * be consistent with the COPY FROM STDIN case.
-			 */
-			char		dummy;
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+	TupleDesc	tupDesc;
+	AttrNumber	attr_count;
+	FmgrInfo   *in_functions = cstate->in_functions;
+	Oid		   *typioparams = cstate->typioparams;
+	int16		fld_count;
+	ListCell   *cur;
 
-			if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("received copy data after EOF marker")));
-			return false;
-		}
+	tupDesc = RelationGetDescr(cstate->rel);
+	attr_count = list_length(cstate->attnumlist);
 
-		if (fld_count != attr_count)
+	cstate->cur_lineno++;
+
+	if (!CopyGetInt16(cstate, &fld_count))
+	{
+		/* EOF detected (end of file, or protocol-level EOF) */
+		return false;
+	}
+
+	if (fld_count == -1)
+	{
+		/*
+		 * Received EOF marker.  Wait for the protocol-level EOF, and complain
+		 * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+		 * that we correctly handle CopyFail, if client chooses to send that
+		 * now.  When copying from file, we could ignore the rest of the file
+		 * like in text mode, but we choose to be consistent with the COPY
+		 * FROM STDIN case.
+		 */
+		char		dummy;
+
+		if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
 			ereport(ERROR,
 					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+					 errmsg("received copy data after EOF marker")));
+		return false;
+	}
 
-		foreach(cur, cstate->attnumlist)
-		{
-			int			attnum = lfirst_int(cur);
-			int			m = attnum - 1;
-			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-			cstate->cur_attname = NameStr(att->attname);
-			values[m] = CopyReadBinaryAttribute(cstate,
-												&in_functions[m],
-												typioparams[m],
-												att->atttypmod,
-												&nulls[m]);
-			cstate->cur_attname = NULL;
-		}
+	if (fld_count != attr_count)
+		ereport(ERROR,
+				(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+				 errmsg("row field count is %d, expected %d",
+						(int) fld_count, attr_count)));
+
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		int			m = attnum - 1;
+		Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+		cstate->cur_attname = NameStr(att->attname);
+		values[m] = CopyReadBinaryAttribute(cstate,
+											&in_functions[m],
+											typioparams[m],
+											att->atttypmod,
+											&nulls[m]);
+		cstate->cur_attname = NULL;
 	}
 
+	return true;
+}
+
+/*
+ * Read next tuple from file for COPY FROM. Return false if no more tuples.
+ *
+ * 'econtext' is used to evaluate default expression for each column that is
+ * either not read from the file or is using the DEFAULT option of COPY FROM.
+ * It can be NULL when no default values are used, i.e. when all columns are
+ * read from the file, and DEFAULT option is unset.
+ *
+ * 'values' and 'nulls' arrays must be the same length as columns of the
+ * relation passed to BeginCopyFrom. This function fills the arrays.
+ */
+bool
+NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
+			 Datum *values, bool *nulls)
+{
+	TupleDesc	tupDesc;
+	AttrNumber	num_phys_attrs,
+				num_defaults = cstate->num_defaults;
+	int			i;
+	int		   *defmap = cstate->defmap;
+	ExprState **defexprs = cstate->defexprs;
+
+	tupDesc = RelationGetDescr(cstate->rel);
+	num_phys_attrs = tupDesc->natts;
+
+	/* Initialize all values for row to NULL */
+	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
+	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
+	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
+
+	if (!cstate->opts.from_routine->CopyFromOneRow(cstate, econtext, values,
+												   nulls))
+		return false;
+
 	/*
 	 * Now compute and insert any defaults available for the columns not
 	 * provided by the input data.  Anything not processed here or above will
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3f4682f95..df29d42555 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,9 +20,6 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-/* This is private in commands/copyfrom.c */
-typedef struct CopyFromStateData *CopyFromState;
-
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index ffad433a21..323e4705d2 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -18,6 +18,49 @@
 #include "executor/tuptable.h"
 #include "nodes/parsenodes.h"
 
+/* This is private in commands/copyfrom.c */
+typedef struct CopyFromStateData *CopyFromState;
+
+typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
+typedef int16 (*CopyFromGetFormat_function) (CopyFromState cstate);
+typedef void (*CopyFromStart_function) (CopyFromState cstate, TupleDesc tupDesc);
+typedef bool (*CopyFromOneRow_function) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+typedef void (*CopyFromEnd_function) (CopyFromState cstate);
+
+/* Routines for a COPY FROM format implementation. */
+typedef struct CopyFromRoutine
+{
+	/*
+	 * Called for processing one COPY FROM option. This will return false when
+	 * the given option is invalid.
+	 */
+	CopyFromProcessOption_function CopyFromProcessOption;
+
+	/*
+	 * Called when COPY FROM is started. This will return a format as int16
+	 * value. It's used for the CopyInResponse message.
+	 */
+	CopyFromGetFormat_function CopyFromGetFormat;
+
+	/*
+	 * Called when COPY FROM is started. This will initialize something and
+	 * receive a header.
+	 */
+	CopyFromStart_function CopyFromStart;
+
+	/* Copy one row. It returns false if no more tuples. */
+	CopyFromOneRow_function CopyFromOneRow;
+
+	/* Called when COPY FROM is ended. This will finalize something. */
+	CopyFromEnd_function CopyFromEnd;
+}			CopyFromRoutine;
+
+/* Built-in CopyFromRoutine for "text", "csv" and "binary". */
+extern CopyFromRoutine CopyFromRoutineText;
+extern CopyFromRoutine CopyFromRoutineCSV;
+extern CopyFromRoutine CopyFromRoutineBinary;
+
+
 typedef struct CopyToStateData *CopyToState;
 
 typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
@@ -113,6 +156,7 @@ typedef struct CopyFormatOptions
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	List	   *convert_select; /* list of column names (can be NIL) */
+	CopyFromRoutine *from_routine;	/* callback routines for COPY FROM */
 	CopyToRoutine *to_routine;	/* callback routines for COPY TO */
 } CopyFormatOptions;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..921c1513f7 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -183,4 +183,8 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+
+
 #endif							/* COPYFROM_INTERNAL_H */
-- 
2.41.0

v8-0008-Add-support-for-implementing-custom-COPY-FROM-for.patchapplication/octet-stream; name=v8-0008-Add-support-for-implementing-custom-COPY-FROM-for.patchDownload

From 3e847de1acb2fd6966ef01192204448711ca3d5e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 24 Jan 2024 14:19:08 +0900
Subject: [PATCH v8 08/10] Add support for implementing custom COPY FROM format
 as extension

* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data
* Rename CopyReadBinaryData() to CopyFromStateRead() because it's a
  method for CopyFromState and "BinaryData" is redundant.
---
 src/backend/commands/copyfromparse.c | 21 ++++++++++-----------
 src/include/commands/copyapi.h       |  5 +++++
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index a78a790060..f8a194635d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -165,7 +165,6 @@ static int	CopyGetData(CopyFromState cstate, void *databuf,
 static inline bool CopyGetInt32(CopyFromState cstate, int32 *val);
 static inline bool CopyGetInt16(CopyFromState cstate, int16 *val);
 static void CopyLoadInputBuf(CopyFromState cstate);
-static int	CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes);
 
 void
 ReceiveCopyBegin(CopyFromState cstate)
@@ -194,7 +193,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
 	int32		tmp;
 
 	/* Signature */
-	if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+	if (CopyFromStateRead(cstate, readSig, 11) != 11 ||
 		memcmp(readSig, BinarySignature, 11) != 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -222,7 +221,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
 	/* Skip extension header, if present */
 	while (tmp-- > 0)
 	{
-		if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+		if (CopyFromStateRead(cstate, readSig, 1) != 1)
 			ereport(ERROR,
 					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 					 errmsg("invalid COPY file header (wrong length)")));
@@ -364,7 +363,7 @@ CopyGetInt32(CopyFromState cstate, int32 *val)
 {
 	uint32		buf;
 
-	if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+	if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
 	{
 		*val = 0;				/* suppress compiler warning */
 		return false;
@@ -381,7 +380,7 @@ CopyGetInt16(CopyFromState cstate, int16 *val)
 {
 	uint16		buf;
 
-	if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+	if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
 	{
 		*val = 0;				/* suppress compiler warning */
 		return false;
@@ -692,14 +691,14 @@ CopyLoadInputBuf(CopyFromState cstate)
 }
 
 /*
- * CopyReadBinaryData
+ * CopyFromStateRead
  *
  * Reads up to 'nbytes' bytes from cstate->copy_file via cstate->raw_buf
  * and writes them to 'dest'.  Returns the number of bytes read (which
  * would be less than 'nbytes' only if we reach EOF).
  */
-static int
-CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
 {
 	int			copied_bytes = 0;
 
@@ -988,7 +987,7 @@ CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
 		 */
 		char		dummy;
 
-		if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+		if (CopyFromStateRead(cstate, &dummy, 1) > 0)
 			ereport(ERROR,
 					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 					 errmsg("received copy data after EOF marker")));
@@ -1997,8 +1996,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	resetStringInfo(&cstate->attribute_buf);
 
 	enlargeStringInfo(&cstate->attribute_buf, fld_size);
-	if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
-						   fld_size) != fld_size)
+	if (CopyFromStateRead(cstate, cstate->attribute_buf.data,
+						  fld_size) != fld_size)
 		ereport(ERROR,
 				(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 				 errmsg("unexpected EOF in COPY data")));
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index b7e8f627bf..22accc83ab 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -314,8 +314,13 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
 	uint64		bytes_processed;	/* number of bytes processed so far */
+
+	/* For custom format implementation */
+	void	   *opaque;			/* private space */
 } CopyFromStateData;
 
+extern int	CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
 /*
  * Represents the different dest cases we need to worry about at
  * the bottom level
-- 
2.41.0

v8-0007-Export-CopyFromStateData.patchapplication/octet-stream; name=v8-0007-Export-CopyFromStateData.patchDownload

From 1ed575fda7f196ea411e9e53dd9c0739f160fb78 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 24 Jan 2024 14:16:29 +0900
Subject: [PATCH v8 07/10] Export CopyFromStateData

It's for custom COPY FROM format handlers implemented as extension.

This just moves codes. This doesn't change codes except CopySource
enum values. CopySource enum values changes aren't required but I did
like I did for CopyDest enum values. I changed COPY_ prefix to
COPY_SOURCE_ prefix. For example, COPY_FILE to COPY_SOURCE_FILE.

Note that this change isn't enough to implement a custom COPY FROM
format handler as extension. We'll do the followings in a subsequent
commit:

1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
 src/backend/commands/copyfrom.c          |   4 +-
 src/backend/commands/copyfromparse.c     |  10 +-
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyapi.h           | 156 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h | 150 ----------------------
 5 files changed, 162 insertions(+), 160 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index d556ebb5d6..b4ac7cbd2c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1710,7 +1710,7 @@ BeginCopyFrom(ParseState *pstate,
 							pg_encoding_to_char(GetDatabaseEncoding()))));
 	}
 
-	cstate->copy_src = COPY_FILE;	/* default */
+	cstate->copy_src = COPY_SOURCE_FILE;	/* default */
 
 	cstate->whereClause = whereClause;
 
@@ -1829,7 +1829,7 @@ BeginCopyFrom(ParseState *pstate,
 	if (data_source_cb)
 	{
 		progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-		cstate->copy_src = COPY_CALLBACK;
+		cstate->copy_src = COPY_SOURCE_CALLBACK;
 		cstate->data_source_cb = data_source_cb;
 	}
 	else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 49632f75e4..a78a790060 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -181,7 +181,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 	for (i = 0; i < natts; i++)
 		pq_sendint16(&buf, format); /* per-column formats */
 	pq_endmessage(&buf);
-	cstate->copy_src = COPY_FRONTEND;
+	cstate->copy_src = COPY_SOURCE_FRONTEND;
 	cstate->fe_msgbuf = makeStringInfo();
 	/* We *must* flush here to ensure FE knows it can send. */
 	pq_flush();
@@ -249,7 +249,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
 	switch (cstate->copy_src)
 	{
-		case COPY_FILE:
+		case COPY_SOURCE_FILE:
 			bytesread = fread(databuf, 1, maxread, cstate->copy_file);
 			if (ferror(cstate->copy_file))
 				ereport(ERROR,
@@ -258,7 +258,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 			if (bytesread == 0)
 				cstate->raw_reached_eof = true;
 			break;
-		case COPY_FRONTEND:
+		case COPY_SOURCE_FRONTEND:
 			while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
 			{
 				int			avail;
@@ -341,7 +341,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 				bytesread += avail;
 			}
 			break;
-		case COPY_CALLBACK:
+		case COPY_SOURCE_CALLBACK:
 			bytesread = cstate->data_source_cb(databuf, minread, maxread);
 			break;
 	}
@@ -1099,7 +1099,7 @@ CopyReadLine(CopyFromState cstate)
 		 * after \. up to the protocol end of copy data.  (XXX maybe better
 		 * not to treat \. as special?)
 		 */
-		if (cstate->copy_src == COPY_FRONTEND)
+		if (cstate->copy_src == COPY_SOURCE_FRONTEND)
 		{
 			int			inbytes;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index df29d42555..cd41d32074 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,8 +20,6 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
 				   uint64 *processed);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index ef1bb201c2..b7e8f627bf 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,11 +14,12 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
+#include "commands/trigger.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
+#include "nodes/miscnodes.h"
 #include "nodes/parsenodes.h"
 
-/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
 
 typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
@@ -162,6 +163,159 @@ typedef struct CopyFormatOptions
 	CopyToRoutine *to_routine;	/* callback routines for COPY TO */
 } CopyFormatOptions;
 
+
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+	COPY_SOURCE_FILE,			/* from file (or a piped program) */
+	COPY_SOURCE_FRONTEND,		/* from frontend */
+	COPY_SOURCE_CALLBACK,		/* from callback function */
+} CopySource;
+
+/*
+ *	Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+	EOL_UNKNOWN,
+	EOL_NL,
+	EOL_CR,
+	EOL_CRNL,
+} EolType;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+	/* low-level state data */
+	CopySource	copy_src;		/* type of copy source */
+	FILE	   *copy_file;		/* used if copy_src == COPY_FILE */
+	StringInfo	fe_msgbuf;		/* used if copy_src == COPY_FRONTEND */
+
+	EolType		eol_type;		/* EOL type of input */
+	int			file_encoding;	/* file or remote side's character encoding */
+	bool		need_transcoding;	/* file encoding diff from server? */
+	Oid			conversion_proc;	/* encoding conversion function */
+
+	/* parameters from the COPY command */
+	Relation	rel;			/* relation to copy from */
+	List	   *attnumlist;		/* integer list of attnums to copy */
+	char	   *filename;		/* filename, or NULL for STDIN */
+	bool		is_program;		/* is 'filename' a program to popen? */
+	copy_data_source_cb data_source_cb; /* function for reading data */
+
+	CopyFormatOptions opts;
+	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	Node	   *whereClause;	/* WHERE condition (or NULL) */
+
+	/* these are just for error messages, see CopyFromErrorCallback */
+	const char *cur_relname;	/* table name for error messages */
+	uint64		cur_lineno;		/* line number for error messages */
+	const char *cur_attname;	/* current att for error messages */
+	const char *cur_attval;		/* current att value for error messages */
+	bool		relname_only;	/* don't output line number, att, etc. */
+
+	/*
+	 * Working state
+	 */
+	MemoryContext copycontext;	/* per-copy execution context */
+
+	AttrNumber	num_defaults;	/* count of att that are missing and have
+								 * default value */
+	FmgrInfo   *in_functions;	/* array of input functions for each attrs */
+	Oid		   *typioparams;	/* array of element types for in_functions */
+	ErrorSaveContext *escontext;	/* soft error trapper during in_functions
+									 * execution */
+	uint64		num_errors;		/* total number of rows which contained soft
+								 * errors */
+	int		   *defmap;			/* array of default att numbers related to
+								 * missing att */
+	ExprState **defexprs;		/* array of default att expressions for all
+								 * att */
+	bool	   *defaults;		/* if DEFAULT marker was found for
+								 * corresponding att */
+	bool		volatile_defexprs;	/* is any of defexprs volatile? */
+	List	   *range_table;	/* single element list of RangeTblEntry */
+	List	   *rteperminfos;	/* single element list of RTEPermissionInfo */
+	ExprState  *qualexpr;
+
+	TransitionCaptureState *transition_capture;
+
+	/*
+	 * These variables are used to reduce overhead in COPY FROM.
+	 *
+	 * attribute_buf holds the separated, de-escaped text for each field of
+	 * the current line.  The CopyReadAttributes functions return arrays of
+	 * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
+	 * the buffer on each cycle.
+	 *
+	 * In binary COPY FROM, attribute_buf holds the binary data for the
+	 * current field, but the usage is otherwise similar.
+	 */
+	StringInfoData attribute_buf;
+
+	/* field raw data pointers found by COPY FROM */
+
+	int			max_fields;
+	char	  **raw_fields;
+
+	/*
+	 * Similarly, line_buf holds the whole input line being processed. The
+	 * input cycle is first to read the whole line into line_buf, and then
+	 * extract the individual attribute fields into attribute_buf.  line_buf
+	 * is preserved unmodified so that we can display it in error messages if
+	 * appropriate.  (In binary mode, line_buf is not used.)
+	 */
+	StringInfoData line_buf;
+	bool		line_buf_valid; /* contains the row being processed? */
+
+	/*
+	 * input_buf holds input data, already converted to database encoding.
+	 *
+	 * In text mode, CopyReadLine parses this data sufficiently to locate line
+	 * boundaries, then transfers the data to line_buf. We guarantee that
+	 * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
+	 * mode, input_buf is not used.)
+	 *
+	 * If encoding conversion is not required, input_buf is not a separate
+	 * buffer but points directly to raw_buf.  In that case, input_buf_len
+	 * tracks the number of bytes that have been verified as valid in the
+	 * database encoding, and raw_buf_len is the total number of bytes stored
+	 * in the buffer.
+	 */
+#define INPUT_BUF_SIZE 65536	/* we palloc INPUT_BUF_SIZE+1 bytes */
+	char	   *input_buf;
+	int			input_buf_index;	/* next byte to process */
+	int			input_buf_len;	/* total # of bytes stored */
+	bool		input_reached_eof;	/* true if we reached EOF */
+	bool		input_reached_error;	/* true if a conversion error happened */
+	/* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+	/*
+	 * raw_buf holds raw input data read from the data source (file or client
+	 * connection), not yet converted to the database encoding.  Like with
+	 * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+	 */
+#define RAW_BUF_SIZE 65536		/* we palloc RAW_BUF_SIZE+1 bytes */
+	char	   *raw_buf;
+	int			raw_buf_index;	/* next byte to process */
+	int			raw_buf_len;	/* total # of bytes stored */
+	bool		raw_reached_eof;	/* true if we reached EOF */
+
+	/* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+	uint64		bytes_processed;	/* number of bytes processed so far */
+} CopyFromStateData;
+
 /*
  * Represents the different dest cases we need to worry about at
  * the bottom level
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 921c1513f7..f8f6120255 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -18,28 +18,6 @@
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
-	COPY_FILE,					/* from file (or a piped program) */
-	COPY_FRONTEND,				/* from frontend */
-	COPY_CALLBACK,				/* from callback function */
-} CopySource;
-
-/*
- *	Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
-	EOL_UNKNOWN,
-	EOL_NL,
-	EOL_CR,
-	EOL_CRNL,
-} EolType;
-
 /*
  * Represents the insert method to be used during COPY FROM.
  */
@@ -52,134 +30,6 @@ typedef enum CopyInsertMethod
 								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
-	/* low-level state data */
-	CopySource	copy_src;		/* type of copy source */
-	FILE	   *copy_file;		/* used if copy_src == COPY_FILE */
-	StringInfo	fe_msgbuf;		/* used if copy_src == COPY_FRONTEND */
-
-	EolType		eol_type;		/* EOL type of input */
-	int			file_encoding;	/* file or remote side's character encoding */
-	bool		need_transcoding;	/* file encoding diff from server? */
-	Oid			conversion_proc;	/* encoding conversion function */
-
-	/* parameters from the COPY command */
-	Relation	rel;			/* relation to copy from */
-	List	   *attnumlist;		/* integer list of attnums to copy */
-	char	   *filename;		/* filename, or NULL for STDIN */
-	bool		is_program;		/* is 'filename' a program to popen? */
-	copy_data_source_cb data_source_cb; /* function for reading data */
-
-	CopyFormatOptions opts;
-	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
-	Node	   *whereClause;	/* WHERE condition (or NULL) */
-
-	/* these are just for error messages, see CopyFromErrorCallback */
-	const char *cur_relname;	/* table name for error messages */
-	uint64		cur_lineno;		/* line number for error messages */
-	const char *cur_attname;	/* current att for error messages */
-	const char *cur_attval;		/* current att value for error messages */
-	bool		relname_only;	/* don't output line number, att, etc. */
-
-	/*
-	 * Working state
-	 */
-	MemoryContext copycontext;	/* per-copy execution context */
-
-	AttrNumber	num_defaults;	/* count of att that are missing and have
-								 * default value */
-	FmgrInfo   *in_functions;	/* array of input functions for each attrs */
-	Oid		   *typioparams;	/* array of element types for in_functions */
-	ErrorSaveContext *escontext;	/* soft error trapper during in_functions
-									 * execution */
-	uint64		num_errors;		/* total number of rows which contained soft
-								 * errors */
-	int		   *defmap;			/* array of default att numbers related to
-								 * missing att */
-	ExprState **defexprs;		/* array of default att expressions for all
-								 * att */
-	bool	   *defaults;		/* if DEFAULT marker was found for
-								 * corresponding att */
-	bool		volatile_defexprs;	/* is any of defexprs volatile? */
-	List	   *range_table;	/* single element list of RangeTblEntry */
-	List	   *rteperminfos;	/* single element list of RTEPermissionInfo */
-	ExprState  *qualexpr;
-
-	TransitionCaptureState *transition_capture;
-
-	/*
-	 * These variables are used to reduce overhead in COPY FROM.
-	 *
-	 * attribute_buf holds the separated, de-escaped text for each field of
-	 * the current line.  The CopyReadAttributes functions return arrays of
-	 * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
-	 * the buffer on each cycle.
-	 *
-	 * In binary COPY FROM, attribute_buf holds the binary data for the
-	 * current field, but the usage is otherwise similar.
-	 */
-	StringInfoData attribute_buf;
-
-	/* field raw data pointers found by COPY FROM */
-
-	int			max_fields;
-	char	  **raw_fields;
-
-	/*
-	 * Similarly, line_buf holds the whole input line being processed. The
-	 * input cycle is first to read the whole line into line_buf, and then
-	 * extract the individual attribute fields into attribute_buf.  line_buf
-	 * is preserved unmodified so that we can display it in error messages if
-	 * appropriate.  (In binary mode, line_buf is not used.)
-	 */
-	StringInfoData line_buf;
-	bool		line_buf_valid; /* contains the row being processed? */
-
-	/*
-	 * input_buf holds input data, already converted to database encoding.
-	 *
-	 * In text mode, CopyReadLine parses this data sufficiently to locate line
-	 * boundaries, then transfers the data to line_buf. We guarantee that
-	 * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
-	 * mode, input_buf is not used.)
-	 *
-	 * If encoding conversion is not required, input_buf is not a separate
-	 * buffer but points directly to raw_buf.  In that case, input_buf_len
-	 * tracks the number of bytes that have been verified as valid in the
-	 * database encoding, and raw_buf_len is the total number of bytes stored
-	 * in the buffer.
-	 */
-#define INPUT_BUF_SIZE 65536	/* we palloc INPUT_BUF_SIZE+1 bytes */
-	char	   *input_buf;
-	int			input_buf_index;	/* next byte to process */
-	int			input_buf_len;	/* total # of bytes stored */
-	bool		input_reached_eof;	/* true if we reached EOF */
-	bool		input_reached_error;	/* true if a conversion error happened */
-	/* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
-	/*
-	 * raw_buf holds raw input data read from the data source (file or client
-	 * connection), not yet converted to the database encoding.  Like with
-	 * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
-	 */
-#define RAW_BUF_SIZE 65536		/* we palloc RAW_BUF_SIZE+1 bytes */
-	char	   *raw_buf;
-	int			raw_buf_index;	/* next byte to process */
-	int			raw_buf_len;	/* total # of bytes stored */
-	bool		raw_reached_eof;	/* true if we reached EOF */
-
-	/* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
-	uint64		bytes_processed;	/* number of bytes processed so far */
-} CopyFromStateData;
-
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-- 
2.41.0

v8-0006-Add-support-for-adding-custom-COPY-FROM-format.patchapplication/octet-stream; name=v8-0006-Add-support-for-adding-custom-COPY-FROM-format.patchDownload

From f48e7b629a8d15fc70cd4cc4737dd2ad61910cc9 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 24 Jan 2024 11:07:14 +0900
Subject: [PATCH v8 06/10] Add support for adding custom COPY FROM format

We use the same approach as we used for custom COPY TO format. Now,
custom COPY format handler can return COPY TO format routines or COPY
FROM format routines based on the "is_from" argument:

    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
---
 src/backend/commands/copy.c                   | 53 +++++++++++++------
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 12 +++++
 .../test_copy_format/sql/test_copy_format.sql |  6 +++
 .../test_copy_format/test_copy_format.c       | 50 +++++++++++++++--
 5 files changed, 105 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index ec6dfff8ab..479f36868c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -472,12 +472,9 @@ ProcessCopyOptionCustomFormat(ParseState *pstate,
 	}
 
 	/* custom format */
-	if (!is_from)
-	{
-		funcargtypes[0] = INTERNALOID;
-		handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-									funcargtypes, true);
-	}
+	funcargtypes[0] = INTERNALOID;
+	handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+								funcargtypes, true);
 	if (!OidIsValid(handlerOid))
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -486,14 +483,36 @@ ProcessCopyOptionCustomFormat(ParseState *pstate,
 
 	datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
 	routine = DatumGetPointer(datum);
-	if (routine == NULL || !IsA(routine, CopyToRoutine))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("COPY handler function %s(%u) did not return a CopyToRoutine struct",
-						format, handlerOid),
-				 parser_errposition(pstate, defel->location)));
-
-	opts_out->to_routine = routine;
+	if (is_from)
+	{
+		if (routine == NULL || !IsA(routine, CopyFromRoutine))
+			ereport(
+					ERROR,
+					(errcode(
+							 ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY handler function "
+							"%s(%u) did not return a "
+							"CopyFromRoutine struct",
+							format, handlerOid),
+					 parser_errposition(
+										pstate, defel->location)));
+		opts_out->from_routine = routine;
+	}
+	else
+	{
+		if (routine == NULL || !IsA(routine, CopyToRoutine))
+			ereport(
+					ERROR,
+					(errcode(
+							 ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("COPY handler function "
+							"%s(%u) did not return a "
+							"CopyToRoutine struct",
+							format, handlerOid),
+					 parser_errposition(
+										pstate, defel->location)));
+		opts_out->to_routine = routine;
+	}
 }
 
 /*
@@ -692,7 +711,11 @@ ProcessCopyOptions(ParseState *pstate,
 		{
 			bool		processed = false;
 
-			if (!is_from)
+			if (is_from)
+				processed =
+					opts_out->from_routine->CopyFromProcessOption(
+																  cstate, defel);
+			else
 				processed = opts_out->to_routine->CopyToProcessOption(cstate, defel);
 			if (!processed)
 				ereport(ERROR,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 323e4705d2..ef1bb201c2 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -30,6 +30,8 @@ typedef void (*CopyFromEnd_function) (CopyFromState cstate);
 /* Routines for a COPY FROM format implementation. */
 typedef struct CopyFromRoutine
 {
+	NodeTag		type;
+
 	/*
 	 * Called for processing one COPY FROM option. This will return false when
 	 * the given option is invalid.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 3a24ae7b97..6af69f0eb7 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,18 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a INT, b INT, c INT);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (
+	option_before 'before',
+	format 'test_copy_format',
+	option_after 'after'
+);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromProcessOption: "option_before"="before"
+NOTICE:  CopyFromProcessOption: "option_after"="after"
+NOTICE:  CopyFromGetFormat
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (
 	option_before 'before',
 	format 'test_copy_format',
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0eb7ed2e11..94d3c789a0 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,12 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a INT, b INT, c INT);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (
+	option_before 'before',
+	format 'test_copy_format',
+	option_after 'after'
+);
+\.
 COPY public.test TO stdout WITH (
 	option_before 'before',
 	format 'test_copy_format',
diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c
index a2219afcde..5e1b40e881 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,50 @@
 
 PG_MODULE_MAGIC;
 
+static bool
+CopyFromProcessOption(CopyFromState cstate, DefElem *defel)
+{
+	ereport(NOTICE,
+			(errmsg("CopyFromProcessOption: \"%s\"=\"%s\"",
+					defel->defname, defGetString(defel))));
+	return true;
+}
+
+static int16
+CopyFromGetFormat(CopyFromState cstate)
+{
+	ereport(NOTICE, (errmsg("CopyFromGetFormat")));
+	return 0;
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+	ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+	ereport(NOTICE, (errmsg("CopyFromOneRow")));
+	return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+	ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+	.type = T_CopyFromRoutine,
+	.CopyFromProcessOption = CopyFromProcessOption,
+	.CopyFromGetFormat = CopyFromGetFormat,
+	.CopyFromStart = CopyFromStart,
+	.CopyFromOneRow = CopyFromOneRow,
+	.CopyFromEnd = CopyFromEnd,
+};
+
 static bool
 CopyToProcessOption(CopyToState cstate, DefElem *defel)
 {
@@ -71,7 +115,7 @@ test_copy_format(PG_FUNCTION_ARGS)
 			(errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
 	if (is_from)
-		elog(ERROR, "COPY FROM isn't supported yet");
-
-	PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+		PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+	else
+		PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.41.0

v8-0009-change-CopyToGetFormat-to-CopyToSendCopyBegin-and.patchapplication/octet-stream; name=v8-0009-change-CopyToGetFormat-to-CopyToSendCopyBegin-and.patchDownload

From f0a8151feff44823881c3c4e1e7aca4f9bd690d5 Mon Sep 17 00:00:00 2001
From: Zhao Junwang <zhjwpku@gmail.com>
Date: Sat, 27 Jan 2024 09:53:31 +0800
Subject: [PATCH v8 09/10] change CopyToGetFormat to CopyToSendCopyBegin and
 export more api

Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
---
 src/backend/commands/copyto.c                 | 65 ++++++++++---------
 src/include/commands/copyapi.h                | 12 ++--
 .../test_copy_format/test_copy_format.c       |  7 +-
 3 files changed, 46 insertions(+), 38 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b5d8678394..e2a4964015 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -66,11 +66,6 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
-static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
-static void CopySendString(CopyToState cstate, const char *str);
-static void CopySendChar(CopyToState cstate, char c);
-static void CopySendInt32(CopyToState cstate, int32 val);
-static void CopySendInt16(CopyToState cstate, int16 val);
 
 /*
  * CopyToRoutine implementations.
@@ -90,10 +85,20 @@ CopyToTextProcessOption(CopyToState cstate, DefElem *defel)
 	return false;
 }
 
-static int16
-CopyToTextGetFormat(CopyToState cstate)
+static void
+CopyToTextSendCopyBegin(CopyToState cstate)
 {
-	return 0;
+	StringInfoData buf;
+	int			natts = list_length(cstate->attnumlist);
+	int16		format = 0;
+	int			i;
+
+	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
+	pq_sendbyte(&buf, format);	/* overall format */
+	pq_sendint16(&buf, natts);
+	for (i = 0; i < natts; i++)
+		pq_sendint16(&buf, format); /* per-column formats */
+	pq_endmessage(&buf);
 }
 
 static void
@@ -230,10 +235,20 @@ CopyToBinaryProcessOption(CopyToState cstate, DefElem *defel)
 	return false;
 }
 
-static int16
-CopyToBinaryGetFormat(CopyToState cstate)
+static void
+CopyToBinarySendCopyBegin(CopyToState cstate)
 {
-	return 1;
+	StringInfoData buf;
+	int			natts = list_length(cstate->attnumlist);
+	int16		format = 1;
+	int			i;
+
+	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
+	pq_sendbyte(&buf, format);	/* overall format */
+	pq_sendint16(&buf, natts);
+	for (i = 0; i < natts; i++)
+		pq_sendint16(&buf, format); /* per-column formats */
+	pq_endmessage(&buf);
 }
 
 static void
@@ -315,7 +330,7 @@ CopyToBinaryEnd(CopyToState cstate)
 
 CopyToRoutine CopyToRoutineText = {
 	.CopyToProcessOption = CopyToTextProcessOption,
-	.CopyToGetFormat = CopyToTextGetFormat,
+	.CopyToSendCopyBegin = CopyToTextSendCopyBegin,
 	.CopyToStart = CopyToTextStart,
 	.CopyToOneRow = CopyToTextOneRow,
 	.CopyToEnd = CopyToTextEnd,
@@ -328,7 +343,7 @@ CopyToRoutine CopyToRoutineText = {
  */
 CopyToRoutine CopyToRoutineCSV = {
 	.CopyToProcessOption = CopyToTextProcessOption,
-	.CopyToGetFormat = CopyToTextGetFormat,
+	.CopyToSendCopyBegin = CopyToTextSendCopyBegin,
 	.CopyToStart = CopyToTextStart,
 	.CopyToOneRow = CopyToTextOneRow,
 	.CopyToEnd = CopyToTextEnd,
@@ -336,7 +351,7 @@ CopyToRoutine CopyToRoutineCSV = {
 
 CopyToRoutine CopyToRoutineBinary = {
 	.CopyToProcessOption = CopyToBinaryProcessOption,
-	.CopyToGetFormat = CopyToBinaryGetFormat,
+	.CopyToSendCopyBegin = CopyToBinarySendCopyBegin,
 	.CopyToStart = CopyToBinaryStart,
 	.CopyToOneRow = CopyToBinaryOneRow,
 	.CopyToEnd = CopyToBinaryEnd,
@@ -349,17 +364,7 @@ CopyToRoutine CopyToRoutineBinary = {
 static void
 SendCopyBegin(CopyToState cstate)
 {
-	StringInfoData buf;
-	int			natts = list_length(cstate->attnumlist);
-	int16		format = cstate->opts.to_routine->CopyToGetFormat(cstate);
-	int			i;
-
-	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
-	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
-	pq_endmessage(&buf);
+	cstate->opts.to_routine->CopyToSendCopyBegin(cstate);
 	cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
@@ -382,19 +387,19 @@ SendCopyEnd(CopyToState cstate)
  * NB: no data conversion is applied by these functions
  *----------
  */
-static void
+void
 CopySendData(CopyToState cstate, const void *databuf, int datasize)
 {
 	appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize);
 }
 
-static void
+void
 CopySendString(CopyToState cstate, const char *str)
 {
 	appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str));
 }
 
-static void
+void
 CopySendChar(CopyToState cstate, char c)
 {
 	appendStringInfoCharMacro(cstate->fe_msgbuf, c);
@@ -464,7 +469,7 @@ CopyToStateFlush(CopyToState cstate)
 /*
  * CopySendInt32 sends an int32 in network byte order
  */
-static inline void
+inline void
 CopySendInt32(CopyToState cstate, int32 val)
 {
 	uint32		buf;
@@ -476,7 +481,7 @@ CopySendInt32(CopyToState cstate, int32 val)
 /*
  * CopySendInt16 sends an int16 in network byte order
  */
-static inline void
+inline void
 CopySendInt16(CopyToState cstate, int16 val)
 {
 	uint16		buf;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 22accc83ab..0a05b24c54 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -67,7 +67,7 @@ extern CopyFromRoutine CopyFromRoutineBinary;
 typedef struct CopyToStateData *CopyToState;
 
 typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
-typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
+typedef void (*CopyToSendCopyBegin_function) (CopyToState cstate);
 typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
 typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
 typedef void (*CopyToEnd_function) (CopyToState cstate);
@@ -84,10 +84,9 @@ typedef struct CopyToRoutine
 	CopyToProcessOption_function CopyToProcessOption;
 
 	/*
-	 * Called when COPY TO is started. This will return a format as int16
-	 * value. It's used for the CopyOutResponse message.
+	 * Called when COPY TO is started.
 	 */
-	CopyToGetFormat_function CopyToGetFormat;
+	CopyToSendCopyBegin_function CopyToSendCopyBegin;
 
 	/* Called when COPY TO is started. This will send a header. */
 	CopyToStart_function CopyToStart;
@@ -384,6 +383,11 @@ typedef struct CopyToStateData
 	void	   *opaque;			/* private space */
 } CopyToStateData;
 
+extern void CopySendData(CopyToState cstate, const void *databuf, int datasize);
+extern void CopySendString(CopyToState cstate, const char *str);
+extern void CopySendChar(CopyToState cstate, char c);
+extern void CopySendInt32(CopyToState cstate, int32 val);
+extern void CopySendInt16(CopyToState cstate, int16 val);
 extern void CopyToStateFlush(CopyToState cstate);
 
 #endif							/* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c
index 5e1b40e881..d833f22bbf 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -71,11 +71,10 @@ CopyToProcessOption(CopyToState cstate, DefElem *defel)
 	return true;
 }
 
-static int16
-CopyToGetFormat(CopyToState cstate)
+static void
+CopyToSendCopyBegin(CopyToState cstate)
 {
 	ereport(NOTICE, (errmsg("CopyToGetFormat")));
-	return 0;
 }
 
 static void
@@ -99,7 +98,7 @@ CopyToEnd(CopyToState cstate)
 static const CopyToRoutine CopyToRoutineTestCopyFormat = {
 	.type = T_CopyToRoutine,
 	.CopyToProcessOption = CopyToProcessOption,
-	.CopyToGetFormat = CopyToGetFormat,
+	.CopyToSendCopyBegin = CopyToSendCopyBegin,
 	.CopyToStart = CopyToStart,
 	.CopyToOneRow = CopyToOneRow,
 	.CopyToEnd = CopyToEnd,
-- 
2.41.0

v8-0010-introduce-contrib-pg_copy_json.patchapplication/octet-stream; name=v8-0010-introduce-contrib-pg_copy_json.patchDownload

From 7dc6c1c798178f31728d048d4d528181626b3695 Mon Sep 17 00:00:00 2001
From: Zhao Junwang <zhjwpku@gmail.com>
Date: Sat, 27 Jan 2024 13:34:38 +0800
Subject: [PATCH v8 10/10] introduce contrib/pg_copy_json

Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
---
 contrib/Makefile                              |   1 +
 contrib/meson.build                           |   1 +
 contrib/pg_copy_json/.gitignore               |   4 +
 contrib/pg_copy_json/Makefile                 |  23 ++
 .../pg_copy_json/expected/pg_copy_json.out    |  80 +++++++
 contrib/pg_copy_json/meson.build              |  34 +++
 contrib/pg_copy_json/pg_copy_json--1.0.sql    |   9 +
 contrib/pg_copy_json/pg_copy_json.c           | 218 ++++++++++++++++++
 contrib/pg_copy_json/pg_copy_json.control     |   5 +
 contrib/pg_copy_json/sql/pg_copy_json.sql     |  59 +++++
 src/backend/utils/adt/json.c                  |   5 +-
 src/include/utils/json.h                      |   2 +
 12 files changed, 438 insertions(+), 3 deletions(-)
 create mode 100644 contrib/pg_copy_json/.gitignore
 create mode 100644 contrib/pg_copy_json/Makefile
 create mode 100644 contrib/pg_copy_json/expected/pg_copy_json.out
 create mode 100644 contrib/pg_copy_json/meson.build
 create mode 100644 contrib/pg_copy_json/pg_copy_json--1.0.sql
 create mode 100644 contrib/pg_copy_json/pg_copy_json.c
 create mode 100644 contrib/pg_copy_json/pg_copy_json.control
 create mode 100644 contrib/pg_copy_json/sql/pg_copy_json.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..82cc496aa2 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
 		pageinspect	\
 		passwordcheck	\
 		pg_buffercache	\
+		pg_copy_json	\
 		pg_freespacemap \
 		pg_prewarm	\
 		pg_stat_statements \
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..38933d15d1 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -45,6 +45,7 @@ subdir('oid2name')
 subdir('pageinspect')
 subdir('passwordcheck')
 subdir('pg_buffercache')
+subdir('pg_copy_json')
 subdir('pgcrypto')
 subdir('pg_freespacemap')
 subdir('pg_prewarm')
diff --git a/contrib/pg_copy_json/.gitignore b/contrib/pg_copy_json/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/pg_copy_json/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_copy_json/Makefile b/contrib/pg_copy_json/Makefile
new file mode 100644
index 0000000000..b0a348d618
--- /dev/null
+++ b/contrib/pg_copy_json/Makefile
@@ -0,0 +1,23 @@
+# contrib/pg_copy_json//Makefile
+
+MODULE_big = pg_copy_json
+OBJS = \
+	$(WIN32RES) \
+	pg_copy_json.o
+PGFILEDESC = "pg_copy_json - COPY TO JSON (JavaScript Object Notation) format"
+
+EXTENSION = pg_copy_json
+DATA = pg_copy_json--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_copy_json
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_copy_json/expected/pg_copy_json.out b/contrib/pg_copy_json/expected/pg_copy_json.out
new file mode 100644
index 0000000000..73633c2303
--- /dev/null
+++ b/contrib/pg_copy_json/expected/pg_copy_json.out
@@ -0,0 +1,80 @@
+--
+-- COPY TO JSON
+--
+CREATE EXTENSION pg_copy_json;
+-- test copying in JSON format with various styles
+-- of embedded line ending characters
+create temp table copytest (
+	style	text,
+	test 	text,
+	filler	int);
+insert into copytest values('DOS',E'abc\r\ndef',1);
+insert into copytest values('Unix',E'abc\ndef',2);
+insert into copytest values('Mac',E'abc\rdef',3);
+insert into copytest values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);
+copy copytest to stdout with (format 'json');
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- pg_copy_json do not support COPY FROM
+copy copytest from stdout with (format 'json');
+ERROR:  cannot use JSON mode in COPY FROM
+-- test copying in JSON format with various styles
+-- of embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout with (format 'json');
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
+-- test force array
+copy copytest to stdout (format 'json', force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format 'json', force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format 'json', force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
diff --git a/contrib/pg_copy_json/meson.build b/contrib/pg_copy_json/meson.build
new file mode 100644
index 0000000000..71f9338267
--- /dev/null
+++ b/contrib/pg_copy_json/meson.build
@@ -0,0 +1,34 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+pg_copy_json_sources = files(
+  'pg_copy_json.c',
+)
+
+if host_system == 'windows'
+  pg_copy_json_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pg_copy_json',
+    '--FILEDESC', 'pg_copy_json - COPY TO JSON format',])
+endif
+
+pg_copy_json = shared_module('pg_copy_json',
+  pg_copy_json_sources,
+  kwargs: contrib_mod_args,
+)
+contrib_targets += pg_copy_json
+
+install_data(
+  'pg_copy_json--1.0.sql',
+  'pg_copy_json.control',
+  kwargs: contrib_data_args,
+)
+
+tests += {
+  'name': 'pg_copy_json',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'pg_copy_json',
+    ],
+  },
+}
diff --git a/contrib/pg_copy_json/pg_copy_json--1.0.sql b/contrib/pg_copy_json/pg_copy_json--1.0.sql
new file mode 100644
index 0000000000..d738a1e7e9
--- /dev/null
+++ b/contrib/pg_copy_json/pg_copy_json--1.0.sql
@@ -0,0 +1,9 @@
+/* contrib/pg_copy_json/copy_json--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_copy_json" to load this file. \quit
+
+CREATE FUNCTION pg_catalog.json(internal)
+	RETURNS copy_handler
+	AS 'MODULE_PATHNAME', 'copy_json'
+	LANGUAGE C;
diff --git a/contrib/pg_copy_json/pg_copy_json.c b/contrib/pg_copy_json/pg_copy_json.c
new file mode 100644
index 0000000000..cbfdee8e8b
--- /dev/null
+++ b/contrib/pg_copy_json/pg_copy_json.c
@@ -0,0 +1,218 @@
+/*--------------------------------------------------------------------------
+ *
+ * pg_copy_json.c
+ *		COPY TO JSON (JavaScript Object Notation) format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copy.h"
+#include "commands/defrem.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "utils/json.h"
+
+PG_MODULE_MAGIC;
+
+typedef struct
+{
+	/*
+	 * Force output of square brackets as array decorations at the beginning
+	 * and end of output, with commas between the rows.
+	 */
+	bool	force_array;
+	bool	force_array_specified;
+	
+	/* need delimiter to start next json array element */
+	bool	json_row_delim_needed;
+} CopyJsonData;
+
+static inline void
+InitCopyJsonData(CopyJsonData *p)
+{
+	Assert(p);
+	p->force_array = false;
+	p->force_array_specified = false;
+	p->json_row_delim_needed = false;
+}
+
+static void
+CopyToJsonSendEndOfRow(CopyToState cstate)
+{
+	switch (cstate->copy_dest)
+	{
+		case COPY_DEST_FILE:
+			/* Default line termination depends on platform */
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			break;
+		case COPY_DEST_FRONTEND:
+			/* The FE/BE protocol uses \n as newline for all platforms */
+			CopySendChar(cstate, '\n');
+			break;
+		default:
+			break;
+	}
+	CopyToStateFlush(cstate);
+}
+
+static bool
+CopyToJsonProcessOption(CopyToState cstate, DefElem *defel)
+{
+	CopyJsonData	   *p;
+
+	if (cstate->opaque == NULL)
+	{
+		MemoryContext oldcontext;
+		oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+		cstate->opaque = palloc0(sizeof(CopyJsonData));
+		MemoryContextSwitchTo(oldcontext);
+		InitCopyJsonData(cstate->opaque);
+	}
+
+	p = (CopyJsonData *)cstate->opaque;
+
+	if (strcmp(defel->defname, "force_array") == 0)
+	{
+		if (p->force_array_specified)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("CopyToJsonProcessOption: redundant options \"%s\"=\"%s\"",
+						   defel->defname, defGetString(defel)));
+		p->force_array_specified = true;
+		p->force_array = defGetBoolean(defel);
+
+		return true;
+	}
+
+	return false;
+}
+
+static void
+CopyToJsonSendCopyBegin(CopyToState cstate)
+{
+	StringInfoData buf;
+	int16		format = 0;
+
+	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
+	pq_sendbyte(&buf, format);	/* overall format */
+	/*
+	 * JSON mode is always one non-binary column
+	 */
+	pq_sendint16(&buf, 1);
+	pq_sendint16(&buf, 0);
+	pq_endmessage(&buf);
+}
+
+static void
+CopyToJsonStart(CopyToState cstate, TupleDesc tupDesc)
+{
+	CopyJsonData	   *p;
+
+	if (cstate->opaque == NULL)
+	{
+		MemoryContext oldcontext;
+		oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+		cstate->opaque = palloc0(sizeof(CopyJsonData));
+		MemoryContextSwitchTo(oldcontext);
+		InitCopyJsonData(cstate->opaque);
+	}
+
+	/* No need to alloc cstate->out_functions */
+
+	p = (CopyJsonData *)cstate->opaque;
+
+	/* If FORCE_ARRAY has been specified send the open bracket. */
+	if (p->force_array)
+	{
+		CopySendChar(cstate, '[');
+		CopyToJsonSendEndOfRow(cstate);
+	}
+}
+
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	Datum				rowdata;
+	StringInfo			result;
+	CopyJsonData	   *p;
+
+	Assert(cstate->opaque);
+	p = (CopyJsonData *)cstate->opaque;
+
+	if(!cstate->rel)
+	{
+		for (int i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+		{
+			/* Flat-copy the attribute array */
+			memcpy(TupleDescAttr(slot->tts_tupleDescriptor, i),
+			TupleDescAttr(cstate->queryDesc->tupDesc, i),
+							1 * sizeof(FormData_pg_attribute));
+		}
+		BlessTupleDesc(slot->tts_tupleDescriptor);
+	}
+	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+	result = makeStringInfo();
+	composite_to_json(rowdata, result, false);
+
+	if (p->json_row_delim_needed)
+		CopySendChar(cstate, ',');
+	else if (p->force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		p->json_row_delim_needed = true;
+	}
+	CopySendData(cstate, result->data, result->len);
+	CopyToJsonSendEndOfRow(cstate);
+}
+
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+	CopyJsonData	   *p;
+
+	Assert(cstate->opaque);
+	p = (CopyJsonData *)cstate->opaque;
+
+	/* If FORCE_ARRAY has been specified send the close bracket. */
+	if (p->force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopyToJsonSendEndOfRow(cstate);
+	}
+}
+
+static const CopyToRoutine CopyToRoutineJson = {
+	.type = T_CopyToRoutine,
+	.CopyToProcessOption = CopyToJsonProcessOption,
+	.CopyToSendCopyBegin = CopyToJsonSendCopyBegin,
+	.CopyToStart = CopyToJsonStart,
+	.CopyToOneRow = CopyToJsonOneRow,
+	.CopyToEnd = CopyToJsonEnd,
+};
+
+PG_FUNCTION_INFO_V1(copy_json);
+Datum
+copy_json(PG_FUNCTION_ARGS)
+{
+	bool		is_from = PG_GETARG_BOOL(0);
+
+	if (is_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot use JSON mode in COPY FROM")));
+
+	PG_RETURN_POINTER(&CopyToRoutineJson);
+}
diff --git a/contrib/pg_copy_json/pg_copy_json.control b/contrib/pg_copy_json/pg_copy_json.control
new file mode 100644
index 0000000000..90b0a74603
--- /dev/null
+++ b/contrib/pg_copy_json/pg_copy_json.control
@@ -0,0 +1,5 @@
+# pg_copy_json extension
+comment = 'COPY TO JSON format'
+default_version = '1.0'
+module_pathname = '$libdir/pg_copy_json'
+relocatable = true
diff --git a/contrib/pg_copy_json/sql/pg_copy_json.sql b/contrib/pg_copy_json/sql/pg_copy_json.sql
new file mode 100644
index 0000000000..73e7e514ac
--- /dev/null
+++ b/contrib/pg_copy_json/sql/pg_copy_json.sql
@@ -0,0 +1,59 @@
+--
+-- COPY TO JSON
+--
+
+CREATE EXTENSION pg_copy_json;
+
+-- test copying in JSON format with various styles
+-- of embedded line ending characters
+
+create temp table copytest (
+	style	text,
+	test 	text,
+	filler	int);
+
+insert into copytest values('DOS',E'abc\r\ndef',1);
+insert into copytest values('Unix',E'abc\ndef',2);
+insert into copytest values('Mac',E'abc\rdef',3);
+insert into copytest values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);
+
+copy copytest to stdout with (format 'json');
+
+-- pg_copy_json do not support COPY FROM
+copy copytest from stdout with (format 'json');
+
+-- test copying in JSON format with various styles
+-- of embedded escaped characters
+
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout with (format 'json');
+
+-- test force array
+
+copy copytest to stdout (format 'json', force_array);
+copy copytest to stdout (format 'json', force_array true);
+copy copytest to stdout (format 'json', force_array false);
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d719a61f16..fabd4e611e 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -83,8 +83,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -507,8 +505,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 6d7f1b387d..d5631171ad 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 								const int *tzp);
-- 
2.41.0

#116

vignesh C

vignesh21@gmail.com

almost 2 years ago

In reply to: Junwang Zhao (#115)

Re: Emitting JSON to file using COPY TO

On Sat, 27 Jan 2024 at 11:25, Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi hackers,

Kou-san(CCed) has been working on *Make COPY format extendable[1]*, so
I think making *copy to json* based on that work might be the right direction.

I write an extension for that purpose, and here is the patch set together
with Kou-san's *extendable copy format* implementation:

0001-0009 is the implementation of extendable copy format
00010 is the pg_copy_json extension

I also created a PR[2] if anybody likes the github review style.

The *extendable copy format* feature is still being developed, I post this
email in case the patch set in this thread is committed without knowing
the *extendable copy format* feature.

I'd like to hear your opinions.

CFBot shows that one of the test is failing as in [1]https://cirrus-ci.com/task/5322439115145216:
[05:46:41.678] /bin/sh: 1: cannot open
/tmp/cirrus-ci-build/contrib/pg_copy_json/sql/test_copy_format.sql: No
such file
[05:46:41.678] diff:
/tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out:
No such file or directory
[05:46:41.678] diff:
/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out:
No such file or directory
[05:46:41.678] # diff command failed with status 512: diff
"/tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out"
"/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out"

"/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out.diff"

[05:46:41.678] Bail out!make[2]: *** [../../src/makefiles/pgxs.mk:454:
check] Error 2
[05:46:41.679] make[1]https://cirrus-ci.com/task/5322439115145216: *** [Makefile:96: check-pg_copy_json-recurse] Error 2
[05:46:41.679] make: *** [GNUmakefile:71: check-world-contrib-recurse] Error 2

Please post an updated version for the same.

[1]: https://cirrus-ci.com/task/5322439115145216

Regards,
Vignesh

#117

Junwang Zhao

zhjwpku@gmail.com

almost 2 years ago

In reply to: vignesh C (#116)

Re: Emitting JSON to file using COPY TO

Hi Vignesh,

On Wed, Jan 31, 2024 at 5:50 PM vignesh C <vignesh21@gmail.com> wrote:

On Sat, 27 Jan 2024 at 11:25, Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi hackers,

Kou-san(CCed) has been working on *Make COPY format extendable[1]*, so
I think making *copy to json* based on that work might be the right direction.

I write an extension for that purpose, and here is the patch set together
with Kou-san's *extendable copy format* implementation:

0001-0009 is the implementation of extendable copy format
00010 is the pg_copy_json extension

I also created a PR[2] if anybody likes the github review style.

The *extendable copy format* feature is still being developed, I post this
email in case the patch set in this thread is committed without knowing
the *extendable copy format* feature.

I'd like to hear your opinions.

CFBot shows that one of the test is failing as in [1]:
[05:46:41.678] /bin/sh: 1: cannot open
/tmp/cirrus-ci-build/contrib/pg_copy_json/sql/test_copy_format.sql: No
such file
[05:46:41.678] diff:
/tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out:
No such file or directory
[05:46:41.678] diff:
/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out:
No such file or directory
[05:46:41.678] # diff command failed with status 512: diff
"/tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out"
"/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out"

"/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out.diff"

[05:46:41.678] Bail out!make[2]: *** [../../src/makefiles/pgxs.mk:454:
check] Error 2
[05:46:41.679] make[1]: *** [Makefile:96: check-pg_copy_json-recurse] Error 2
[05:46:41.679] make: *** [GNUmakefile:71: check-world-contrib-recurse] Error 2

Please post an updated version for the same.

Thanks for the reminder, the patch set I posted is not for commit but
for further discussion.

I will post more information about the *extendable copy* feature
when it's about to be committed.

[1] - https://cirrus-ci.com/task/5322439115145216

Regards,
Vignesh

--
Regards
Junwang Zhao

#118

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 2 years ago

In reply to: jian he (#114)

Re: Emitting JSON to file using COPY TO

On 2024-Jan-23, jian he wrote:

+           | FORMAT_LA copy_generic_opt_arg
+               {
+                   $$ = makeDefElem("format", $2, @1);
+               }
;
I think it's not necessary. "format" option is already handled in
copy_generic_opt_elem.
test it, I found out this part is necessary.
because a query with WITH like `copy (select 1) to stdout with
(format json, force_array false); ` will fail.

Right, because "FORMAT JSON" is turned into FORMAT_LA JSON by parser.c
(see base_yylex there). I'm not really sure but I think it might be
better to make it "| FORMAT_LA JSON" instead of invoking the whole
copy_generic_opt_arg syntax. Not because of performance, but just
because it's much clearer what's going on.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#119

jian.universality@gmail.com

almost 2 years ago

In reply to: Alvaro Herrera (#118)

Re: Emitting JSON to file using COPY TO

On Wed, Jan 31, 2024 at 9:26 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jan-23, jian he wrote:
+           | FORMAT_LA copy_generic_opt_arg
+               {
+                   $$ = makeDefElem("format", $2, @1);
+               }
;
I think it's not necessary. "format" option is already handled in
copy_generic_opt_elem.
test it, I found out this part is necessary.
because a query with WITH like `copy (select 1) to stdout with
(format json, force_array false); ` will fail.
Right, because "FORMAT JSON" is turned into FORMAT_LA JSON by parser.c
(see base_yylex there). I'm not really sure but I think it might be
better to make it "| FORMAT_LA JSON" instead of invoking the whole
copy_generic_opt_arg syntax. Not because of performance, but just
because it's much clearer what's going on.

sorry to bother you.
Now I didn't apply any patch, just at the master.
I don't know much about gram.y.

copy (select 1) to stdout with (format json1);
ERROR: COPY format "json1" not recognized
LINE 1: copy (select 1) to stdout with (format json1);
^
copy (select 1) to stdout with (format json);
ERROR: syntax error at or near "format"
LINE 1: copy (select 1) to stdout with (format json);
^

json is a keyword. Is it possible to escape it?
make `copy (select 1) to stdout with (format json)` error message the same as
`copy (select 1) to stdout with (format json1)`

#120

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 2 years ago

In reply to: jian he (#119)

Re: Emitting JSON to file using COPY TO

On 2024-Feb-02, jian he wrote:

copy (select 1) to stdout with (format json);
ERROR: syntax error at or near "format"
LINE 1: copy (select 1) to stdout with (format json);
^

json is a keyword. Is it possible to escape it?
make `copy (select 1) to stdout with (format json)` error message the same as
`copy (select 1) to stdout with (format json1)`

Sure, you can use
copy (select 1) to stdout with (format "json");
and then you get
ERROR: COPY format "json" not recognized

is that what you meant?

If you want the server to send this message when the JSON word is not in
quotes, I'm afraid that's not possible, due to the funny nature of the
FORMAT keyword when the JSON keyword appears after it. But why do you
care? If you use the patch, then you no longer need to have the "not
recognized" error messages anymore, because the JSON format is indeed
a recognized one.

Maybe I didn't understand your question.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#121

jian.universality@gmail.com

almost 2 years ago

In reply to: Alvaro Herrera (#120)

Re: Emitting JSON to file using COPY TO

On Fri, Feb 2, 2024 at 5:48 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If you want the server to send this message when the JSON word is not in
quotes, I'm afraid that's not possible, due to the funny nature of the
FORMAT keyword when the JSON keyword appears after it. But why do you
care? If you use the patch, then you no longer need to have the "not
recognized" error messages anymore, because the JSON format is indeed
a recognized one.

"JSON word is not in quotes" is my intention.

Now it seems when people implement any custom format for COPY,
if the format_name is a keyword then we need single quotes.

Thanks for clarifying!

#122

jian.universality@gmail.com

almost 2 years ago

In reply to: Alvaro Herrera (#118)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Fri, Jan 19, 2024 at 4:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

if (opts_out->json_mode && is_from)
ereport(ERROR, ...);

if (!opts_out->json_mode && opts_out->force_array)
ereport(ERROR, ...);

Also these checks can be moved close to other checks at the end of
ProcessCopyOptions().
---
@@ -3395,6 +3395,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+           | JSON
+               {
+                   $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+               }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3427,6 +3431,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+           | FORCE ARRAY
+               {
+                   $$ = makeDefElem("force_array", (Node *)
makeBoolean(true), @1);
+               }
;
I believe we don't need to support new options in old-style syntax.

you are right about the force_array case.
we don't need to add force_array related changes in gram.y.

On Wed, Jan 31, 2024 at 9:26 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jan-23, jian he wrote:
+           | FORMAT_LA copy_generic_opt_arg
+               {
+                   $$ = makeDefElem("format", $2, @1);
+               }
;
I think it's not necessary. "format" option is already handled in
copy_generic_opt_elem.
test it, I found out this part is necessary.
because a query with WITH like `copy (select 1) to stdout with
(format json, force_array false); ` will fail.
Right, because "FORMAT JSON" is turned into FORMAT_LA JSON by parser.c
(see base_yylex there). I'm not really sure but I think it might be
better to make it "| FORMAT_LA JSON" instead of invoking the whole
copy_generic_opt_arg syntax. Not because of performance, but just
because it's much clearer what's going on.

I am not sure what alternative you are referring to.
I've rebased the patch, made some cosmetic changes.
Now I think it's pretty neat.
you can, based on it, make your change, then I may understand the
alternative you are referring to.

Attachments:

v9-0001-Add-another-COPY-fomrat-json.patchapplication/x-patch; name=v9-0001-Add-another-COPY-fomrat-json.patchDownload

From b3d3d6023f96aa7971a0663d8c0bd6de50e877a5 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 19 Feb 2024 10:37:18 +0800
Subject: [PATCH v9 1/2] Add another COPY fomrat: json

this format is only allowed in COPY TO operation.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         |   5 ++
 src/backend/commands/copy.c        |  13 +++
 src/backend/commands/copyto.c      | 125 ++++++++++++++++++++---------
 src/backend/parser/gram.y          |   8 ++
 src/backend/utils/adt/json.c       |   5 +-
 src/include/commands/copy.h        |   1 +
 src/include/utils/json.h           |   2 +
 src/test/regress/expected/copy.out |  54 +++++++++++++
 src/test/regress/sql/copy.sql      |  38 +++++++++
 9 files changed, 208 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 55764fc1..ef9e4729 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -214,9 +214,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6..5d5b733d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -480,6 +480,8 @@ ProcessCopyOptions(ParseState *pstate,
 				 /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
 				opts_out->csv_mode = true;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->json_mode = true;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->binary = true;
 			else
@@ -716,6 +718,11 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot specify HEADER in BINARY mode")));
 
+	if (opts_out->json_mode && opts_out->header_line)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot specify HEADER in JSON mode")));
+
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
 		ereport(ERROR,
@@ -793,6 +800,12 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY FREEZE cannot be used with COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->json_mode && is_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot use JSON mode in COPY FROM")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 20ffc903..c948a431 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -28,6 +28,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -37,6 +38,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
@@ -86,6 +88,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -146,9 +149,20 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (!cstate->opts.json_mode)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON mode is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -907,11 +921,7 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-	bool		need_delim = false;
-	FmgrInfo   *out_functions = cstate->out_functions;
 	MemoryContext oldcontext;
-	ListCell   *cur;
-	char	   *string;
 
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
@@ -922,53 +932,88 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
 	}
 
-	/* Make sure the tuple is fully deconstructed */
-	slot_getallattrs(slot);
-
-	foreach(cur, cstate->attnumlist)
+	if (!cstate->opts.json_mode)
 	{
-		int			attnum = lfirst_int(cur);
-		Datum		value = slot->tts_values[attnum - 1];
-		bool		isnull = slot->tts_isnull[attnum - 1];
+		bool		need_delim = false;
+		FmgrInfo   *out_functions = cstate->out_functions;
+		ListCell   *cur;
+		char	   *string;
 
-		if (!cstate->opts.binary)
-		{
-			if (need_delim)
-				CopySendChar(cstate, cstate->opts.delim[0]);
-			need_delim = true;
-		}
+		/* Make sure the tuple is fully deconstructed */
+		slot_getallattrs(slot);
 
-		if (isnull)
-		{
-			if (!cstate->opts.binary)
-				CopySendString(cstate, cstate->opts.null_print_client);
-			else
-				CopySendInt32(cstate, -1);
-		}
-		else
+		foreach(cur, cstate->attnumlist)
 		{
+			int			attnum = lfirst_int(cur);
+			Datum		value = slot->tts_values[attnum - 1];
+			bool		isnull = slot->tts_isnull[attnum - 1];
+
 			if (!cstate->opts.binary)
 			{
-				string = OutputFunctionCall(&out_functions[attnum - 1],
-											value);
-				if (cstate->opts.csv_mode)
-					CopyAttributeOutCSV(cstate, string,
-										cstate->opts.force_quote_flags[attnum - 1]);
+				if (need_delim)
+					CopySendChar(cstate, cstate->opts.delim[0]);
+				need_delim = true;
+			}
+
+			if (isnull)
+			{
+				if (!cstate->opts.binary)
+					CopySendString(cstate, cstate->opts.null_print_client);
 				else
-					CopyAttributeOutText(cstate, string);
+					CopySendInt32(cstate, -1);
 			}
 			else
 			{
-				bytea	   *outputbytes;
+				if (!cstate->opts.binary)
+				{
+					string = OutputFunctionCall(&out_functions[attnum - 1],
+												value);
+					if (cstate->opts.csv_mode)
+						CopyAttributeOutCSV(cstate, string,
+											cstate->opts.force_quote_flags[attnum - 1]);
+					else
+						CopyAttributeOutText(cstate, string);
+				}
+				else
+				{
+					bytea	   *outputbytes;
 
-				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-											   value);
-				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-				CopySendData(cstate, VARDATA(outputbytes),
-							 VARSIZE(outputbytes) - VARHDRSZ);
+					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+												   value);
+					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+					CopySendData(cstate, VARDATA(outputbytes),
+								 VARSIZE(outputbytes) - VARHDRSZ);
+				}
 			}
 		}
 	}
+	else
+	{
+		Datum		rowdata;
+		StringInfo	result;
+
+		/*
+		 * iff COPY TO command's source data is from a query, not a relation,
+		 * then we need to copy the TupleDesc from the cstate->queryDesc.
+		 * because query execution returning slot's TupleDesc may change,
+		 * composite_to_json requires correct TupleDesc.
+		*/
+		if(!cstate->rel)
+		{
+			for (int i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+			{
+				memcpy(TupleDescAttr(slot->tts_tupleDescriptor, i),
+				TupleDescAttr(cstate->queryDesc->tupDesc, i),
+								1 * sizeof(FormData_pg_attribute));
+			}
+			BlessTupleDesc(slot->tts_tupleDescriptor);
+		}
+		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+		result = makeStringInfo();
+		composite_to_json(rowdata, result, false);
+
+		CopySendData(cstate, result->data, result->len);
+	}
 
 	CopySendEndOfRow(cstate);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 60b31d9f..ada49e4c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3424,6 +3424,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3506,6 +3510,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+			{
+				$$ = makeDefElem("format", $2, @1);
+			}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d719a61f..fabd4e61 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -83,8 +83,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -507,8 +505,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0..f591b613 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -53,6 +53,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
+	bool		json_mode;		/* JSON format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 6d7f1b38..d5631171 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 								const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365ec..0c5ade47 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -42,6 +42,60 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- Error
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+-- Error
+copy copytest from stdout (format json);
+ERROR:  cannot use JSON mode in COPY FROM
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e906..da6b0b0a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -54,6 +54,44 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- Error
+copy copytest to stdout (format json, header);
+-- Error
+copy copytest from stdout (format json);
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
-- 
2.34.1

v9-0002-Add-option-force_array-for-COPY-TO-JSON-fomrat.patchapplication/x-patch; name=v9-0002-Add-option-force_array-for-COPY-TO-JSON-fomrat.patchDownload

From 10338f10221e095127e7671776a478565add4df4 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 19 Feb 2024 10:40:53 +0800
Subject: [PATCH v9 2/2] Add option force_array for COPY TO JSON fomrat.

make add opening brackets and close brackets for the whole json output.
also, separate each json record with comma.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++++
 src/backend/commands/copy.c        | 17 +++++++++++++++++
 src/backend/commands/copyto.c      | 28 ++++++++++++++++++++++++++++
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 24 ++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      |  9 +++++++++
 6 files changed, 93 insertions(+)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index ef9e4729..83f4a43f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
 </synopsis>
@@ -386,6 +387,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5d5b733d..89373119 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -456,6 +456,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		freeze_specified = false;
 	bool		header_specified = false;
 	bool		on_error_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -610,6 +611,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -806,6 +814,15 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot use JSON mode in COPY FROM")));
 
+	if (!opts_out->json_mode && opts_out->force_array)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+	if (!opts_out->json_mode && force_array_specified)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("COPY FORCE_ARRAY only available in JSON mode")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c948a431..419e7e63 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -860,6 +860,16 @@ DoCopyTo(CopyToState cstate)
 
 			CopySendEndOfRow(cstate);
 		}
+		/*
+		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+		 * the opening bracket.
+		*/
+		if (cstate->opts.json_mode && cstate->opts.force_array)
+		{
+			CopySendChar(cstate, '[');
+			CopySendEndOfRow(cstate);
+		}
+
 	}
 
 	if (cstate->rel)
@@ -907,6 +917,15 @@ DoCopyTo(CopyToState cstate)
 		CopySendEndOfRow(cstate);
 	}
 
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * closing bracket.
+	*/
+	if (cstate->opts.json_mode && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendEndOfRow(cstate);
+	}
 	MemoryContextDelete(cstate->rowcontext);
 
 	if (fe_copy)
@@ -1012,6 +1031,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		result = makeStringInfo();
 		composite_to_json(rowdata, result, false);
 
+		if (cstate->json_row_delim_needed && cstate->opts.force_array)
+			CopySendChar(cstate, ',');
+		else if (cstate->opts.force_array)
+		{
+			/* first row needs no delimiter */
+			CopySendChar(cstate, ' ');
+			cstate->json_row_delim_needed = true;
+		}
+
 		CopySendData(cstate, result->data, result->len);
 	}
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f591b613..51656eec 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -72,6 +72,7 @@ typedef struct CopyFormatOptions
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		force_array;	/* add JSON array decorations */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	List	   *convert_select; /* list of column names (can be NIL) */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0c5ade47..7812768c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -59,6 +59,30 @@ ERROR:  cannot specify HEADER in JSON mode
 -- Error
 copy copytest from stdout (format json);
 ERROR:  cannot use JSON mode in COPY FROM
+--Error
+copy copytest to stdout (format csv, force_array false);
+ERROR:  COPY FORCE_ARRAY only available in JSON mode
+copy copytest from stdin (format json, force_array true);
+ERROR:  cannot use JSON mode in COPY FROM
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index da6b0b0a..f685193b 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -64,6 +64,15 @@ copy copytest to stdout (format json, header);
 -- Error
 copy copytest from stdout (format json);
 
+--Error
+copy copytest to stdout (format csv, force_array false);
+copy copytest from stdin (format json, force_array true);
+
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

#123

Andrey M. Borodin

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Joe Conway (#110)

Re: Emitting JSON to file using COPY TO

Hello everyone!

Thanks for working on this, really nice feature!

On 9 Jan 2024, at 01:40, Joe Conway <mail@joeconway.com> wrote:

Thanks -- will have a look

Joe, recently folks proposed a lot of patches in this thread that seem like diverted from original way of implementation.
As an author of CF entry [0]https://commitfest.postgresql.org/47/4716/ can you please comment on which patch version needs review?

Thanks!

Best regards, Andrey Borodin.

[0]: https://commitfest.postgresql.org/47/4716/

#124

mail@joeconway.com

almost 2 years ago

In reply to: Andrey M. Borodin (#123)

Re: Emitting JSON to file using COPY TO

On 3/8/24 12:28, Andrey M. Borodin wrote:

Hello everyone!

Thanks for working on this, really nice feature!

On 9 Jan 2024, at 01:40, Joe Conway <mail@joeconway.com> wrote:

Thanks -- will have a look

Joe, recently folks proposed a lot of patches in this thread that seem like diverted from original way of implementation.
As an author of CF entry [0] can you please comment on which patch version needs review?

I don't know if I agree with the proposed changes, but I have also been
waiting to see how the parallel discussion regarding COPY extensibility
shakes out.

And there were a couple of issues found that need to be tracked down.

Additionally I have had time/availability challenges recently.

Overall, chances seem slim that this will make it into 17, but I have
not quite given up hope yet either.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#125

[0]: /messages/by-id/CACJufxHd6ZRmJJBsDOGpovaVAekMS-u6AOrcw0Ja-Wyi-0kGtA@mail.gmail.com
[1]: /messages/by-id/CAEZATCWh29787xf=4NgkoixeqRHrqi0Qd33Z6_-F8t2dZ0yLCQ@mail.gmail.com
[2]: /messages/by-id/CAD21AoCb02zhZM3vXb8HSw8fwOsL+iRdEFb--Kmunv8PjPAWjw@mail.gmail.com

jian.universality@gmail.com

almost 2 years ago

In reply to: Joe Conway (#124)

Re: Emitting JSON to file using COPY TO

On Sat, Mar 9, 2024 at 2:03 AM Joe Conway <mail@joeconway.com> wrote:

On 3/8/24 12:28, Andrey M. Borodin wrote:

Hello everyone!

Thanks for working on this, really nice feature!

On 9 Jan 2024, at 01:40, Joe Conway <mail@joeconway.com> wrote:

Thanks -- will have a look

Joe, recently folks proposed a lot of patches in this thread that seem like diverted from original way of implementation.
As an author of CF entry [0] can you please comment on which patch version needs review?

I don't know if I agree with the proposed changes, but I have also been
waiting to see how the parallel discussion regarding COPY extensibility
shakes out.

And there were a couple of issues found that need to be tracked down.

Additionally I have had time/availability challenges recently.

Overall, chances seem slim that this will make it into 17, but I have
not quite given up hope yet either.

Hi.
summary changes I've made in v9 patches at [0]/messages/by-id/CACJufxHd6ZRmJJBsDOGpovaVAekMS-u6AOrcw0Ja-Wyi-0kGtA@mail.gmail.com

meta: rebased. Now you need to use `git apply` or `git am`, previously
copyto_json.007.diff, you need to use GNU patch.

at [1]/messages/by-id/CAEZATCWh29787xf=4NgkoixeqRHrqi0Qd33Z6_-F8t2dZ0yLCQ@mail.gmail.com, Dean Rasheed found some corner cases when the returned slot's
tts_tupleDescriptor
from
`
ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0, true);
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
`
cannot be used for composite_to_json.
generally DestReceiver->rStartup is to send the TupleDesc to the DestReceiver,
The COPY TO DestReceiver's rStartup function is copy_dest_startup,
however copy_dest_startup is a no-op.
That means to make the final TupleDesc of COPY TO (FORMAT JSON)
operation bullet proof,
we need to copy the tupDesc from CopyToState's queryDesc.
This only applies to when the COPY TO source is a query (example:
copy (select 1) to stdout), not a table.
The above is my interpretation.

at [2]/messages/by-id/CAD21AoCb02zhZM3vXb8HSw8fwOsL+iRdEFb--Kmunv8PjPAWjw@mail.gmail.com, Masahiko Sawada made several points.
Mainly split the patch to two, one for format json, second is for
options force_array.
Splitting into two is easier to review, I think.
My changes also addressed all the points Masahiko Sawada had mentioned.

#126

jian.universality@gmail.com

almost 2 years ago

In reply to: jian he (#125)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Sat, Mar 9, 2024 at 9:13 AM jian he <jian.universality@gmail.com> wrote:

On Sat, Mar 9, 2024 at 2:03 AM Joe Conway <mail@joeconway.com> wrote:

On 3/8/24 12:28, Andrey M. Borodin wrote:

Hello everyone!

Thanks for working on this, really nice feature!

On 9 Jan 2024, at 01:40, Joe Conway <mail@joeconway.com> wrote:

Thanks -- will have a look

Joe, recently folks proposed a lot of patches in this thread that seem like diverted from original way of implementation.
As an author of CF entry [0] can you please comment on which patch version needs review?

I don't know if I agree with the proposed changes, but I have also been
waiting to see how the parallel discussion regarding COPY extensibility
shakes out.

And there were a couple of issues found that need to be tracked down.

Additionally I have had time/availability challenges recently.

Overall, chances seem slim that this will make it into 17, but I have
not quite given up hope yet either.

Hi.
summary changes I've made in v9 patches at [0]

meta: rebased. Now you need to use `git apply` or `git am`, previously
copyto_json.007.diff, you need to use GNU patch.

at [1], Dean Rasheed found some corner cases when the returned slot's
tts_tupleDescriptor
from
`
ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0, true);
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
`
cannot be used for composite_to_json.
generally DestReceiver->rStartup is to send the TupleDesc to the DestReceiver,
The COPY TO DestReceiver's rStartup function is copy_dest_startup,
however copy_dest_startup is a no-op.
That means to make the final TupleDesc of COPY TO (FORMAT JSON)
operation bullet proof,
we need to copy the tupDesc from CopyToState's queryDesc.
This only applies to when the COPY TO source is a query (example:
copy (select 1) to stdout), not a table.
The above is my interpretation.

trying to simplify the explanation.
first refer to the struct DestReceiver.
COPY TO (FORMAT JSON), we didn't send the preliminary Tupdesc to the
DestReceiver
via the rStartup function pointer within struct _DestReceiver.

`CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)`
the slot is the final slot returned via query execution.
but we cannot use the tupdesc (slot->tts_tupleDescriptor) to do
composite_to_json.
because the final return slot Tupdesc may change during the query execution.

so we need to copy the tupDesc from CopyToState's queryDesc.

aslo rebased, now we can apply it cleanly.

Attachments:

v10-0001-introduce-json-format-for-COPY-TO-operation.patchtext/x-patch; charset=US-ASCII; name=v10-0001-introduce-json-format-for-COPY-TO-operation.patchDownload

From 17d9f3765bb74d863e229afed59af7dd923f5379 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 1 Apr 2024 19:47:40 +0800
Subject: [PATCH v10 1/2] introduce json format for COPY TO operation.  json
 format is only allowed in COPY TO operation.  also cannot be used with header
 option.

 discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
 discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         |   5 ++
 src/backend/commands/copy.c        |  13 +++
 src/backend/commands/copyto.c      | 125 ++++++++++++++++++++---------
 src/backend/parser/gram.y          |   8 ++
 src/backend/utils/adt/json.c       |   5 +-
 src/include/commands/copy.h        |   1 +
 src/include/utils/json.h           |   2 +
 src/test/regress/expected/copy.out |  54 +++++++++++++
 src/test/regress/sql/copy.sql      |  38 +++++++++
 9 files changed, 208 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 33ce7c4e..add84dbb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,9 +218,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f75e1d70..02f16d9e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -497,6 +497,8 @@ ProcessCopyOptions(ParseState *pstate,
 				 /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
 				opts_out->csv_mode = true;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->json_mode = true;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->binary = true;
 			else
@@ -740,6 +742,11 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot specify HEADER in BINARY mode")));
 
+	if (opts_out->json_mode && opts_out->header_line)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot specify HEADER in JSON mode")));
+
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
 		ereport(ERROR,
@@ -817,6 +824,12 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY FREEZE cannot be used with COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->json_mode && is_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot use JSON mode in COPY FROM")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36..fe2eb244 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -139,9 +141,20 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (!cstate->opts.json_mode)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON mode is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -901,11 +914,7 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-	bool		need_delim = false;
-	FmgrInfo   *out_functions = cstate->out_functions;
 	MemoryContext oldcontext;
-	ListCell   *cur;
-	char	   *string;
 
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
@@ -916,53 +925,89 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
 	}
 
-	/* Make sure the tuple is fully deconstructed */
-	slot_getallattrs(slot);
-
-	foreach(cur, cstate->attnumlist)
+	if (!cstate->opts.json_mode)
 	{
-		int			attnum = lfirst_int(cur);
-		Datum		value = slot->tts_values[attnum - 1];
-		bool		isnull = slot->tts_isnull[attnum - 1];
+		bool		need_delim = false;
+		FmgrInfo   *out_functions = cstate->out_functions;
+		ListCell   *cur;
+		char	   *string;
 
-		if (!cstate->opts.binary)
-		{
-			if (need_delim)
-				CopySendChar(cstate, cstate->opts.delim[0]);
-			need_delim = true;
-		}
+		/* Make sure the tuple is fully deconstructed */
+		slot_getallattrs(slot);
 
-		if (isnull)
-		{
-			if (!cstate->opts.binary)
-				CopySendString(cstate, cstate->opts.null_print_client);
-			else
-				CopySendInt32(cstate, -1);
-		}
-		else
+		foreach(cur, cstate->attnumlist)
 		{
+			int			attnum = lfirst_int(cur);
+			Datum		value = slot->tts_values[attnum - 1];
+			bool		isnull = slot->tts_isnull[attnum - 1];
+
 			if (!cstate->opts.binary)
 			{
-				string = OutputFunctionCall(&out_functions[attnum - 1],
-											value);
-				if (cstate->opts.csv_mode)
-					CopyAttributeOutCSV(cstate, string,
-										cstate->opts.force_quote_flags[attnum - 1]);
+				if (need_delim)
+					CopySendChar(cstate, cstate->opts.delim[0]);
+				need_delim = true;
+			}
+
+			if (isnull)
+			{
+				if (!cstate->opts.binary)
+					CopySendString(cstate, cstate->opts.null_print_client);
 				else
-					CopyAttributeOutText(cstate, string);
+					CopySendInt32(cstate, -1);
 			}
 			else
 			{
-				bytea	   *outputbytes;
+				if (!cstate->opts.binary)
+				{
+					string = OutputFunctionCall(&out_functions[attnum - 1],
+												value);
+					if (cstate->opts.csv_mode)
+						CopyAttributeOutCSV(cstate, string,
+											cstate->opts.force_quote_flags[attnum - 1]);
+					else
+						CopyAttributeOutText(cstate, string);
+				}
+				else
+				{
+					bytea	   *outputbytes;
 
-				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-											   value);
-				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-				CopySendData(cstate, VARDATA(outputbytes),
-							 VARSIZE(outputbytes) - VARHDRSZ);
+					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+												   value);
+					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+					CopySendData(cstate, VARDATA(outputbytes),
+								 VARSIZE(outputbytes) - VARHDRSZ);
+				}
 			}
 		}
 	}
+	else
+	{
+		Datum		rowdata;
+		StringInfo	result;
+
+		/*
+		 * if the COPY TO command's source data is from a query, not a table,
+		 * then we need to copy the TupleDesc from the cstate->queryDesc.
+		 * because returning slot's TupleDesc may change during query execution,
+		 * but composite_to_json requires correct TupleDesc.
+		 * so we need copy from cstate->queryDesc to make it bullet proof.
+		*/
+		if(!cstate->rel)
+		{
+			for (int i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+			{
+				memcpy(TupleDescAttr(slot->tts_tupleDescriptor, i),
+					   TupleDescAttr(cstate->queryDesc->tupDesc, i),
+					   1 * sizeof(FormData_pg_attribute));
+			}
+			BlessTupleDesc(slot->tts_tupleDescriptor);
+		}
+		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+		result = makeStringInfo();
+		composite_to_json(rowdata, result, false);
+
+		CopySendData(cstate, result->data, result->len);
+	}
 
 	CopySendEndOfRow(cstate);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f1af6147..d5ccdf0b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3442,6 +3442,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3524,6 +3528,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+			{
+				$$ = makeDefElem("format", $2, @1);
+			}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d719a61f..fabd4e61 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -83,8 +83,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -507,8 +505,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48d..ff6ecc7a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -62,6 +62,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
+	bool		json_mode;		/* JSON format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 6d7f1b38..d5631171 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 								const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365ec..0c5ade47 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -42,6 +42,60 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- Error
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+-- Error
+copy copytest from stdout (format json);
+ERROR:  cannot use JSON mode in COPY FROM
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e906..da6b0b0a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -54,6 +54,44 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- Error
+copy copytest to stdout (format json, header);
+-- Error
+copy copytest from stdout (format json);
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);

base-commit: 7aa00f1360e0c6938fdf32d3fbb8b847b6098b88
-- 
2.34.1

v10-0002-Add-option-force_array-for-COPY-TO-JSON-fomrat.patchtext/x-patch; charset=US-ASCII; name=v10-0002-Add-option-force_array-for-COPY-TO-JSON-fomrat.patchDownload

From c939262f6266cea38761aa89cc6bb495b8d38ccb Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 1 Apr 2024 19:53:10 +0800
Subject: [PATCH v10 2/2] Add option force_array for COPY TO JSON fomrat

force_array option only apply to COPY TO operation with JSON format.
make add opening brackets and close brackets for the whole json output.
also, separate each json record with a comma.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++++
 src/backend/commands/copy.c        | 17 +++++++++++++++++
 src/backend/commands/copyto.c      | 29 +++++++++++++++++++++++++++++
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 24 ++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      |  9 +++++++++
 6 files changed, 94 insertions(+)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index add84dbb..f4100cba 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
     LOG_VERBOSITY <replaceable class="parameter">mode</replaceable>
@@ -390,6 +391,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 02f16d9e..2ffd6978 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -473,6 +473,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		header_specified = false;
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -627,6 +628,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -830,6 +838,15 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("cannot use JSON mode in COPY FROM")));
 
+	if (!opts_out->json_mode && opts_out->force_array)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+	if (!opts_out->json_mode && force_array_specified)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("COPY FORCE_ARRAY only available in JSON mode")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fe2eb244..8e4edb6e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -81,6 +81,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -853,6 +854,16 @@ DoCopyTo(CopyToState cstate)
 
 			CopySendEndOfRow(cstate);
 		}
+		/*
+		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+		 * the opening bracket.
+		*/
+		if (cstate->opts.json_mode && cstate->opts.force_array)
+		{
+			CopySendChar(cstate, '[');
+			CopySendEndOfRow(cstate);
+		}
+
 	}
 
 	if (cstate->rel)
@@ -900,6 +911,15 @@ DoCopyTo(CopyToState cstate)
 		CopySendEndOfRow(cstate);
 	}
 
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * closing bracket.
+	*/
+	if (cstate->opts.json_mode && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendEndOfRow(cstate);
+	}
 	MemoryContextDelete(cstate->rowcontext);
 
 	if (fe_copy)
@@ -1006,6 +1026,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		result = makeStringInfo();
 		composite_to_json(rowdata, result, false);
 
+		if (cstate->json_row_delim_needed && cstate->opts.force_array)
+			CopySendChar(cstate, ',');
+		else if (cstate->opts.force_array)
+		{
+			/* first row needs no delimiter */
+			CopySendChar(cstate, ' ');
+			cstate->json_row_delim_needed = true;
+		}
+
 		CopySendData(cstate, result->data, result->len);
 	}
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ff6ecc7a..f76fe8fa 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -81,6 +81,7 @@ typedef struct CopyFormatOptions
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		force_array;	/* add JSON array decorations */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	CopyLogVerbosityChoice log_verbosity;	/* verbosity of logged messages */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0c5ade47..7812768c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -59,6 +59,30 @@ ERROR:  cannot specify HEADER in JSON mode
 -- Error
 copy copytest from stdout (format json);
 ERROR:  cannot use JSON mode in COPY FROM
+--Error
+copy copytest to stdout (format csv, force_array false);
+ERROR:  COPY FORCE_ARRAY only available in JSON mode
+copy copytest from stdin (format json, force_array true);
+ERROR:  cannot use JSON mode in COPY FROM
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index da6b0b0a..f685193b 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -64,6 +64,15 @@ copy copytest to stdout (format json, header);
 -- Error
 copy copytest from stdout (format json);
 
+--Error
+copy copytest to stdout (format csv, force_array false);
+copy copytest from stdin (format json, force_array true);
+
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

#127

jian.universality@gmail.com

over 1 year ago

In reply to: jian he (#126)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Mon, Apr 1, 2024 at 8:00 PM jian he <jian.universality@gmail.com> wrote:

rebased.
minor cosmetic error message change.

I think all the issues in this thread have been addressed.

Attachments:

v11-0001-introduce-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v11-0001-introduce-json-format-for-COPY-TO.patchDownload

From b96dfe41f0935b08b1190f399e29ee2450169529 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 17 Aug 2024 11:08:25 +0800
Subject: [PATCH v11 1/2] introduce json format for COPY TO

 json format is only allowed in COPY TO operation.
 also cannot be used with header option.

 discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
 discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         |  5 +++
 src/backend/commands/copy.c        | 13 +++++++
 src/backend/commands/copyto.c      | 51 +++++++++++++++++++++++++---
 src/backend/parser/gram.y          |  8 +++++
 src/backend/utils/adt/json.c       |  5 ++-
 src/include/commands/copy.h        |  1 +
 src/include/utils/json.h           |  2 ++
 src/test/regress/expected/copy.out | 54 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 38 +++++++++++++++++++++
 9 files changed, 169 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..616abf508e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,9 +218,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..a3ee65d000 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -494,6 +494,8 @@ ProcessCopyOptions(ParseState *pstate,
 				 /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
 				opts_out->csv_mode = true;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->json_mode = true;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->binary = true;
 			else
@@ -739,6 +741,11 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
+	if (opts_out->json_mode && opts_out->header_line)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot specify %s in JSON mode", "HEADER")));
+
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
 		ereport(ERROR,
@@ -837,6 +844,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->json_mode && is_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("COPY json mode cannot be used with %s", "COPY FROM")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..14ba9fde50 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -139,9 +141,20 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (!cstate->opts.json_mode)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON mode is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -917,7 +930,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (!cstate->opts.binary && !cstate->opts.json_mode)
 	{
 		bool		need_delim = false;
 
@@ -945,7 +958,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (!cstate->opts.json_mode)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -965,6 +978,34 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else
+	{
+		Datum		rowdata;
+		StringInfo	result;
+
+		/*
+		 * if COPY TO source data is from a query, not a table, then we need
+		 * copy CopyToState->TupleDesc->attrs to
+		 * slot->tts_tupleDescriptor->attrs because the slot's TupleDesc->attrs
+		 * may change during query execution, but composite_to_json requires
+		 * correct TupleDesc->attrs for constructing the json keys.
+		 * composite_to_json will iterate each TupleDesc->attrs so no need to
+		 * copy other filed in cstate->queryDesc->tupDesc.
+		*/
+		if(!cstate->rel)
+		{
+			memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+				TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+				cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+			BlessTupleDesc(slot->tts_tupleDescriptor);
+		}
+		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+		result = makeStringInfo();
+		composite_to_json(rowdata, result, false);
+
+		CopySendData(cstate, result->data, result->len);
+	}
 
 	CopySendEndOfRow(cstate);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c3f25582c3..76bd6f8949 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3500,6 +3500,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3582,6 +3586,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+			{
+				$$ = makeDefElem("format", $2, @1);
+			}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 4eeeeaf0a6..b47ae14f20 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..ff6ecc7ae7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -62,6 +62,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
+	bool		json_mode;		/* JSON format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 79c1062e1b..c904ef6c6e 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 44114089a6..77cb36cd61 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -42,6 +42,60 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- Error
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+-- Error
+copy copytest from stdout (format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index e2dd24cb35..67f77f6512 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -54,6 +54,44 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- Error
+copy copytest to stdout (format json, header);
+-- Error
+copy copytest from stdout (format json);
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);

base-commit: 03e9b958eef44046c0092f1f34da4b2de0f9071d
-- 
2.34.1

v11-0002-Add-option-force_array-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v11-0002-Add-option-force_array-for-COPY-TO.patchDownload

From cb006c2ad2ca652efa884e058a0cec491fde4b8d Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 17 Aug 2024 12:45:35 +0800
Subject: [PATCH v11 2/2] Add option force_array for COPY TO

 force_array option can only be used in COPY TO with JSON format.
 it make the output json output like json array type.

 discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
 discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++++
 src/backend/commands/copyto.c      | 29 +++++++++++++++++++++++++++++
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++++++
 src/test/regress/sql/copy.sql      |  9 +++++++++
 6 files changed, 89 insertions(+)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 616abf508e..4659e49ec3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR <replaceable class="parameter">error_action</replaceable>
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
     LOG_VERBOSITY <replaceable class="parameter">verbosity</replaceable>
@@ -390,6 +391,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a3ee65d000..247ff1fcdc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -470,6 +470,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		header_specified = false;
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -624,6 +625,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -850,6 +858,11 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY json mode cannot be used with %s", "COPY FROM")));
 
+	if (!opts_out->json_mode && opts_out->force_array)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 14ba9fde50..649c425dfc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -81,6 +81,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -854,6 +855,16 @@ DoCopyTo(CopyToState cstate)
 
 			CopySendEndOfRow(cstate);
 		}
+		/*
+		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+		 * the opening bracket.
+		*/
+		if (cstate->opts.json_mode && cstate->opts.force_array)
+		{
+			CopySendChar(cstate, '[');
+			CopySendEndOfRow(cstate);
+		}
+
 	}
 
 	if (cstate->rel)
@@ -901,6 +912,15 @@ DoCopyTo(CopyToState cstate)
 		CopySendEndOfRow(cstate);
 	}
 
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * closing bracket.
+	*/
+	if (cstate->opts.json_mode && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendEndOfRow(cstate);
+	}
 	MemoryContextDelete(cstate->rowcontext);
 
 	if (fe_copy)
@@ -1004,6 +1024,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		result = makeStringInfo();
 		composite_to_json(rowdata, result, false);
 
+		if (cstate->json_row_delim_needed && cstate->opts.force_array)
+			CopySendChar(cstate, ',');
+		else if (cstate->opts.force_array)
+		{
+			/* first row needs no delimiter */
+			CopySendChar(cstate, ' ');
+			cstate->json_row_delim_needed = true;
+		}
+
 		CopySendData(cstate, result->data, result->len);
 	}
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ff6ecc7ae7..f76fe8fafe 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -81,6 +81,7 @@ typedef struct CopyFormatOptions
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		force_array;	/* add JSON array decorations */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	CopyLogVerbosityChoice log_verbosity;	/* verbosity of logged messages */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 77cb36cd61..ed1cb6b28e 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -59,6 +59,29 @@ ERROR:  cannot specify HEADER in JSON mode
 -- Error
 copy copytest from stdout (format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 67f77f6512..28f698ecc6 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -64,6 +64,15 @@ copy copytest to stdout (format json, header);
 -- Error
 copy copytest from stdout (format json);
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

#128

jian.universality@gmail.com

over 1 year ago

In reply to: jian he (#127)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Mon, Aug 19, 2024 at 8:00 AM jian he <jian.universality@gmail.com> wrote:

On Mon, Apr 1, 2024 at 8:00 PM jian he <jian.universality@gmail.com> wrote:

rebased.
minor cosmetic error message change.

I think all the issues in this thread have been addressed.

hi.
I did some minor changes based on the v11.

mainly changing some error code from
ERRCODE_FEATURE_NOT_SUPPORTED
to
ERRCODE_INVALID_PARAMETER_VALUE.

Attachments:

v12-0002-Add-option-force_array-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v12-0002-Add-option-force_array-for-COPY-TO.patchDownload

From 24c0ac71f0a3fe1f4eb3d558aef34620234b2b35 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 22 Aug 2024 12:15:52 +0800
Subject: [PATCH v12 2/2] Add option force_array for COPY TO

force_array option can only be used in COPY TO with JSON format.
it make the output json output behave like json array type.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++++
 src/backend/commands/copyto.c      | 28 ++++++++++++++++++++++++++++
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++++++
 src/test/regress/sql/copy.sql      |  9 +++++++++
 6 files changed, 88 insertions(+)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 616abf508e..4659e49ec3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR <replaceable class="parameter">error_action</replaceable>
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
     LOG_VERBOSITY <replaceable class="parameter">verbosity</replaceable>
@@ -390,6 +391,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f65c96dd3..54793fecab 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -470,6 +470,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		header_specified = false;
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -624,6 +625,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -850,6 +858,11 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY json mode cannot be used with %s", "COPY FROM")));
 
+	if (!opts_out->json_mode && opts_out->force_array)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 14ba9fde50..b56b78a331 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -81,6 +81,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -854,6 +855,15 @@ DoCopyTo(CopyToState cstate)
 
 			CopySendEndOfRow(cstate);
 		}
+		/*
+		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+		 * the opening bracket.
+		*/
+		if (cstate->opts.json_mode && cstate->opts.force_array)
+		{
+			CopySendChar(cstate, '[');
+			CopySendEndOfRow(cstate);
+		}
 	}
 
 	if (cstate->rel)
@@ -901,6 +911,15 @@ DoCopyTo(CopyToState cstate)
 		CopySendEndOfRow(cstate);
 	}
 
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * closing bracket.
+	*/
+	if (cstate->opts.json_mode && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendEndOfRow(cstate);
+	}
 	MemoryContextDelete(cstate->rowcontext);
 
 	if (fe_copy)
@@ -1004,6 +1023,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		result = makeStringInfo();
 		composite_to_json(rowdata, result, false);
 
+		if (cstate->json_row_delim_needed && cstate->opts.force_array)
+			CopySendChar(cstate, ',');
+		else if (cstate->opts.force_array)
+		{
+			/* first row needs no delimiter */
+			CopySendChar(cstate, ' ');
+			cstate->json_row_delim_needed = true;
+		}
+
 		CopySendData(cstate, result->data, result->len);
 	}
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ff6ecc7ae7..f76fe8fafe 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -81,6 +81,7 @@ typedef struct CopyFormatOptions
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		force_array;	/* add JSON array decorations */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	CopyLogVerbosityChoice log_verbosity;	/* verbosity of logged messages */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 77cb36cd61..ed1cb6b28e 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -59,6 +59,29 @@ ERROR:  cannot specify HEADER in JSON mode
 -- Error
 copy copytest from stdout (format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 67f77f6512..28f698ecc6 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -64,6 +64,15 @@ copy copytest to stdout (format json, header);
 -- Error
 copy copytest from stdout (format json);
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v12-0001-introduce-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v12-0001-introduce-json-format-for-COPY-TO.patchDownload

From 51f0322ad7ce7159cc46f80380f98d604560a001 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 22 Aug 2024 11:58:12 +0800
Subject: [PATCH v12 1/2] introduce json format for COPY TO

 json format is only allowed in COPY TO operation.
 also cannot be used with header option.

 discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
 discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         |  5 +++
 src/backend/commands/copy.c        | 13 +++++++
 src/backend/commands/copyto.c      | 51 +++++++++++++++++++++++++---
 src/backend/parser/gram.y          |  8 +++++
 src/backend/utils/adt/json.c       |  5 ++-
 src/include/commands/copy.h        |  1 +
 src/include/utils/json.h           |  2 ++
 src/test/regress/expected/copy.out | 54 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 38 +++++++++++++++++++++
 9 files changed, 169 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..616abf508e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,9 +218,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..2f65c96dd3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -494,6 +494,8 @@ ProcessCopyOptions(ParseState *pstate,
 				 /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
 				opts_out->csv_mode = true;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->json_mode = true;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->binary = true;
 			else
@@ -739,6 +741,11 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
+	if (opts_out->json_mode && opts_out->header_line)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cannot specify %s in JSON mode", "HEADER")));
+
 	/* Check quote */
 	if (!opts_out->csv_mode && opts_out->quote != NULL)
 		ereport(ERROR,
@@ -837,6 +844,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->json_mode && is_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY json mode cannot be used with %s", "COPY FROM")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..14ba9fde50 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -139,9 +141,20 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (!cstate->opts.json_mode)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON mode is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -917,7 +930,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (!cstate->opts.binary && !cstate->opts.json_mode)
 	{
 		bool		need_delim = false;
 
@@ -945,7 +958,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (!cstate->opts.json_mode)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -965,6 +978,34 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else
+	{
+		Datum		rowdata;
+		StringInfo	result;
+
+		/*
+		 * if COPY TO source data is from a query, not a table, then we need
+		 * copy CopyToState->TupleDesc->attrs to
+		 * slot->tts_tupleDescriptor->attrs because the slot's TupleDesc->attrs
+		 * may change during query execution, but composite_to_json requires
+		 * correct TupleDesc->attrs for constructing the json keys.
+		 * composite_to_json will iterate each TupleDesc->attrs so no need to
+		 * copy other filed in cstate->queryDesc->tupDesc.
+		*/
+		if(!cstate->rel)
+		{
+			memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+				TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+				cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+			BlessTupleDesc(slot->tts_tupleDescriptor);
+		}
+		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+		result = makeStringInfo();
+		composite_to_json(rowdata, result, false);
+
+		CopySendData(cstate, result->data, result->len);
+	}
 
 	CopySendEndOfRow(cstate);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c3f25582c3..76bd6f8949 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3500,6 +3500,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3582,6 +3586,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+			{
+				$$ = makeDefElem("format", $2, @1);
+			}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 4eeeeaf0a6..b47ae14f20 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..ff6ecc7ae7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -62,6 +62,7 @@ typedef struct CopyFormatOptions
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
+	bool		json_mode;		/* JSON format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 79c1062e1b..c904ef6c6e 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 44114089a6..77cb36cd61 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -42,6 +42,60 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- Error
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+-- Error
+copy copytest from stdout (format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index e2dd24cb35..67f77f6512 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -54,6 +54,44 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- Error
+copy copytest to stdout (format json, header);
+-- Error
+copy copytest from stdout (format json);
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);

base-commit: 9bb842f95ef3384f0822c386a4c569780e613e4e
-- 
2.34.1

#129

jian.universality@gmail.com

over 1 year ago

In reply to: jian he (#128)

1 attachment(s)

Re: Emitting JSON to file using COPY TO

Hi.

in ExecutePlan
we have:

for (;;)
{
ResetPerTupleExprContext(estate);
slot = ExecProcNode(planstate);
if (!TupIsNull(slot))
{
if((slot != NULL) && (slot->tts_tupleDescriptor != NULL)
&& (slot->tts_tupleDescriptor->natts > 0)
&& (slot->tts_tupleDescriptor->attrs->attname.data[0] == '\0'))
elog(INFO, "%s:%d %s this slot first attribute attname is
null", __FILE_NAME__, __LINE__, __func__);
}
if (TupIsNull(slot))
break;
if (sendTuples)
{
if (!dest->receiveSlot(slot, dest))
break;
}

dest->receiveSlot(slot, dest) is responsible for sending values to destination,
for COPY TO it will call copy_dest_receive, CopyOneRowTo.

For the copy to format json, we need to make sure
in "dest->receiveSlot(slot, dest))", the slot->tts_tupleDescriptor has
proper information.
because we *use* slot->tts_tupleDescriptor->attrs->attname as the json key.

For example, if (slot->tts_tupleDescriptor->attrs->attname.data[0] == '\0')
then output json may look like: {"":12}
which is not what we want.

in ExecutePlan i use
elog(INFO, "%s:%d %s this slot first attribute attname is null",
__FILE_NAME__, __LINE__, __func__);
to find sql queries that attribute name is not good.

based on that, i found out many COPY TO (FORMAT JSON) queries will either
error out or the output json key be empty string
if in CopyOneRowTo we didn't copy the cstate->queryDesc->tupDesc
to the slot->tts_tupleDescriptor

You can test it yourself.
first `git am v12-0001-introduce-json-format-for-COPY-TO.patch`
after that, comment out the memcpy call in CopyOneRowTo, just like the
following:
if(!cstate->rel)
{
// memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
// TupleDescAttr(cstate->queryDesc->tupDesc, 0),
// cstate->queryDesc->tupDesc->natts *
sizeof(FormData_pg_attribute));

build and test with the attached script.
you will see COPY TO FORMAT JSON, lots of cases where the json key
becomes an empty string.

I think this thread related issues has been resolved.

#130

jian.universality@gmail.com

about 1 year ago

In reply to: jian he (#129)

3 attachment(s)

Re: Emitting JSON to file using COPY TO

hi. there.

new patch attached.
v13-00001 is from
/messages/by-id/fbc63db8-94de-45f6-b327-50456630264d@app.fastmail.com
just refactoring copy format.
The author is Joel Jacobson!

v13-0002, v13-0003 almost the same as previously v12.
some minor change compared to v12:
* refactor code, based on v13-0001. instead of a bool in
CopyFormatOptions-> json_mode

now we use
typedef enum CopyFormat
{
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
COPY_FORMAT_JSON,
} CopyFormat;
to represent the format of the COPY operation.

* make JSON format cannot be used with {default, null, delimiter}
options. and add related tests, documentation.

* add tab_complete

Attachments:

v13-0002-introduce-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v13-0002-introduce-json-format-for-COPY-TO.patchDownload

From e866e079eae53b998945f98b637ffe24c571762e Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 19 Nov 2024 12:25:53 +0800
Subject: [PATCH v13 2/3] introduce json format for COPY TO

json format is only allowed in COPY TO operation.
also cannot be used with {header, default, null, delimiter} options.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 13 +++++--
 src/backend/commands/copy.c        | 29 +++++++++++++++
 src/backend/commands/copyto.c      | 51 ++++++++++++++++++++++---
 src/backend/parser/gram.y          |  8 ++++
 src/backend/utils/adt/json.c       |  5 +--
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/include/utils/json.h           |  2 +
 src/test/regress/expected/copy.out | 60 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 41 ++++++++++++++++++++
 10 files changed, 199 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..5bf0f38d90 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -219,10 +219,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -257,7 +262,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +276,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -294,7 +299,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -310,7 +315,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       If this option is set to <literal>MATCH</literal>, the number and names
       of the columns in the header line must match the actual column names of
       the table, in order;  otherwise an error is raised.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
       The <literal>MATCH</literal> option is only valid for <command>COPY
       FROM</command> commands.
      </para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b7e819de40..4b8bc87666 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -516,6 +516,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -681,16 +683,32 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->delim)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				 errmsg("cannot specify %s in JSON mode", "DELIMITER")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->null_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in JSON mode", "NULL")));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->default_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify %s in JSON mode", "DEFAULT")));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
@@ -761,6 +779,11 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->header_line)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cannot specify %s in JSON mode", "HEADER")));
+
 	/* Check quote */
 	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
@@ -864,6 +887,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY json mode cannot be used with %s", "COPY FROM")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9d71d34..87709d76be 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -139,9 +141,20 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -921,7 +934,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (cstate->opts.format != COPY_FORMAT_BINARY)
+	if (cstate->opts.format == COPY_FORMAT_TEXT || cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		bool		need_delim = false;
 
@@ -949,7 +962,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -969,6 +982,34 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else
+	{
+		Datum		rowdata;
+		StringInfo	result;
+
+		/*
+		 * if COPY TO source data is from a query, not a table (copy the_table
+		 * to stdout), then we need copy CopyToState->TupleDesc->attrs to
+		 * slot->tts_tupleDescriptor->attrs because the slot's TupleDesc->attrs
+		 * may change during query execution, but composite_to_json requires
+		 * correct TupleDesc->attrs for constructing the json keys.
+		 * composite_to_json will iterate each TupleDesc->attrs so no need to
+		 * copy other fields in cstate->queryDesc->tupDesc.
+		*/
+		if(!cstate->rel)
+		{
+			memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+				TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+				cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+			BlessTupleDesc(slot->tts_tupleDescriptor);
+		}
+		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+		result = makeStringInfo();
+		composite_to_json(rowdata, result, false);
+
+		CopySendData(cstate, result->data, result->len);
+	}
 
 	CopySendEndOfRow(cstate);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 67eb96396a..853532cf7d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3457,6 +3457,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3539,6 +3543,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+			{
+				$$ = makeDefElem("format", $2, @1);
+			}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 058aade2af..5de9b86a96 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991..48cf854a1d 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3239,7 +3239,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c3d1df267f..076ae59f96 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
 } CopyFormat;
 
 /*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 79c1062e1b..c904ef6c6e 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..430f11f3f1 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,66 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..3d21f20c98 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,47 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, default '|');
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
-- 
2.34.1

v13-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchtext/x-patch; charset=US-ASCII; name=v13-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From ad5a855ceac2a2bf762009210d1f8193dda58fb3 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 19 Nov 2024 12:22:18 +0800
Subject: [PATCH v13 3/3] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.
it make the output json output behave like json array type.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++++
 src/backend/commands/copyto.c      | 28 ++++++++++++++++++++++++++++
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++++++
 src/test/regress/sql/copy.sql      |  9 +++++++++
 7 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5bf0f38d90..50cebec0ce 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR <replaceable class="parameter">error_action</replaceable>
     REJECT_LIMIT <replaceable class="parameter">maxerror</replaceable>
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
@@ -392,6 +393,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4b8bc87666..71091e1bf3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -490,6 +490,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -644,6 +645,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -893,6 +901,11 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY json mode cannot be used with %s", "COPY FROM")));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY")));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 87709d76be..7d22ea7e8a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -81,6 +81,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -858,6 +859,15 @@ DoCopyTo(CopyToState cstate)
 
 			CopySendEndOfRow(cstate);
 		}
+		/*
+		 * If JSON has been requested, and FORCE_ARRAY has been specified send
+		 * the opening bracket.
+		*/
+		if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+		{
+			CopySendChar(cstate, '[');
+			CopySendEndOfRow(cstate);
+		}
 	}
 
 	if (cstate->rel)
@@ -905,6 +915,15 @@ DoCopyTo(CopyToState cstate)
 		CopySendEndOfRow(cstate);
 	}
 
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * closing bracket.
+	*/
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendEndOfRow(cstate);
+	}
 	MemoryContextDelete(cstate->rowcontext);
 
 	if (fe_copy)
@@ -1008,6 +1027,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		result = makeStringInfo();
 		composite_to_json(rowdata, result, false);
 
+		if (cstate->json_row_delim_needed && cstate->opts.force_array)
+			CopySendChar(cstate, ',');
+		else if (cstate->opts.force_array)
+		{
+			/* first row needs no delimiter */
+			CopySendChar(cstate, ' ');
+			cstate->json_row_delim_needed = true;
+		}
+
 		CopySendData(cstate, result->data, result->len);
 	}
 
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 48cf854a1d..f291e7caba 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3234,7 +3234,7 @@ match_previous_words(int pattern_id,
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
-					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+					  "FORCE_NOT_NULL", "FORCE_NULL", "FORCE_ARRAY", "ENCODING", "DEFAULT",
 					  "ON_ERROR", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 076ae59f96..25e534b901 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -92,6 +92,7 @@ typedef struct CopyFormatOptions
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		force_array;	/* add JSON array decorations */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	CopyLogVerbosityChoice log_verbosity;	/* verbosity of logged messages */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 430f11f3f1..a35ffbe683 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -96,6 +96,29 @@ ERROR:  cannot specify DEFAULT in JSON mode
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 3d21f20c98..91daf8482c 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -95,6 +95,15 @@ copy copytest to stdout (format json, default '|');
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v13-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patchtext/x-patch; charset=US-ASCII; name=v13-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patchDownload

From 69dd037fbfc211684010b5f24977234cf970b312 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH v13 1/3] Introduce CopyFormat and replace csv_mode and binary
 fields with it.

---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      | 10 +++---
 src/backend/commands/copyfromparse.c | 34 +++++++++----------
 src/backend/commands/copyto.c        | 20 +++++------
 src/include/commands/copy.h          | 13 ++++++--
 src/tools/pgindent/typedefs.list     |  1 +
 6 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..b7e819de40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -511,11 +511,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -675,31 +675,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -747,7 +747,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -755,43 +755,44 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+												opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -805,8 +806,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV &&
+		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -821,8 +822,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,7 +847,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -882,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -899,7 +900,8 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY &&
+		opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 754cb49616..428b62cb9a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		/*
 		 * If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryInputInfo(att->atttypid,
 								   &in_func_oid, &typioparams[attnum - 1]);
 		else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
 
 	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d8..51eb14d743 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		{
 			int			fldnum;
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
 			else
 				fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		return false;
 
 	/* Parse the line into de-escaped field values */
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
 	else
 		fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
 	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		char	  **field_strings;
 		ListCell   *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 				continue;
 			}
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 			{
 				if (string == NULL &&
 					cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		quotec = cstate->opts.quote[0];
 		escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
-		if (cstate->opts.csv_mode)
+		if (cstate->opts.format == COPY_FORMAT_CSV)
 		{
 			/*
 			 * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \r */
-		if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
 					if (cstate->eol_type == EOL_CRNL)
 						ereport(ERROR,
 								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errmsg("literal carriage return found in data") :
 								 errmsg("unquoted carriage return found in data"),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errhint("Use \"\\r\" to represent carriage return.") :
 								 errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
 			else if (cstate->eol_type == EOL_NL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal carriage return found in data") :
 						 errmsg("unquoted carriage return found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\r\" to represent carriage return.") :
 						 errhint("Use quoted CSV field to represent carriage return.")));
 			/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \n */
-		if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal newline found in data") :
 						 errmsg("unquoted newline found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\n\" to represent newline.") :
 						 errhint("Use quoted CSV field to represent newline.")));
 			cstate->eol_type = EOL_NL;	/* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * Process backslash, except in CSV mode where backslash is a normal
 		 * character.
 		 */
-		if (c == '\\' && !cstate->opts.csv_mode)
+		if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
 		{
 			char		c2;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..03c9d71d34 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 			{
 				/* Default line termination depends on platform */
 #ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
 			break;
 		case COPY_FRONTEND:
 			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 				CopySendChar(cstate, '\n');
 
 			/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
 		bool		isvarlena;
 		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryOutputInfo(attr->atttypid,
 									&out_func_oid,
 									&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate header for a binary copy */
 		int32		tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
 				else
 					CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate trailer for a binary copy */
 		CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Binary per-tuple header */
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		bool		need_delim = false;
 
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->opts.force_quote_flags[attnum - 1]);
 				else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..c3d1df267f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 08521d51a9..b81da581cf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -491,6 +491,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromState
 CopyFromStateData
-- 
2.34.1

#131

[1]: /messages/by-id/20250301.115009.424844407736647598.kou@clear-code.com

jian.universality@gmail.com

12 months ago

In reply to: jian he (#130)

3 attachment(s)

Re: Emitting JSON to file using COPY TO

hi.

There are two ways we can use to represent the new copy format: json.
1.
typedef struct CopyFormatOptions
{
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
bool json_mode; /* JSON format? */
...
}

2.
typedef struct CopyFormatOptions
{
CopyFormat format; /* format of the COPY operation */
.....
}

typedef enum CopyFormat
{
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
COPY_FORMAT_JSON,
} CopyFormat;

both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184.
so the struct size will not influence the performance.

I also did some benchmarks when using CopyFormat.
the following are the benchmarks info:
-------------------------------------------------------------------------------------------------------
create unlogged table t as select g from generate_series(1, 1_000_000) g;

build_type=release patch:
copy t to '/dev/null' json \watch i=0.1 c=10
last execution Time: 108.741 ms

copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time: 42.600 ms

build_type=release master:
copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time Time: 42.948 ms

---------------------------------------------------------
so a new version is attached, using the struct CopyFormatOptions.

changes mainly in CopyOneRowTo.
now it is:
""""
if(!cstate->rel)
{
memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
TupleDescAttr(cstate->queryDesc->tupDesc, 0),
cstate->queryDesc->tupDesc->natts *
sizeof(FormData_pg_attribute));
for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
populate_compact_attribute(slot->tts_tupleDescriptor, i);
BlessTupleDesc(slot->tts_tupleDescriptor);
}
""""
reasoning for change:
for composite_to_json to construct json key, we only need
FormData_pg_attribute.attname
but code path
composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute
means we also need to call populate_compact_attribute to populate
other attributes.

v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch,
author is by Joel Jacobson.
As I mentioned in above,
replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions.
but consolidated 3 bool fields into one enum to make code more lean.
I think the refactoring (v14-0001) is worth it.

Attachments:

v14-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchtext/x-patch; charset=US-ASCII; name=v14-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From 38fe90d00746b7441faef4bdc392090aea1b9ee3 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 27 Jan 2025 16:05:37 +0800
Subject: [PATCH v14 3/3] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.
it make the output json output behave like json array type.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++++
 src/backend/commands/copyto.c      | 28 ++++++++++++++++++++++++++++
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++++++
 src/test/regress/sql/copy.sql      |  9 +++++++++
 7 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5bf0f38d90..50cebec0ce 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR <replaceable class="parameter">error_action</replaceable>
     REJECT_LIMIT <replaceable class="parameter">maxerror</replaceable>
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
@@ -392,6 +393,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c0234d22e5..2dbbd4007e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -504,6 +504,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -658,6 +659,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -907,6 +915,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d5d1bc4ec9..764ed4766d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -81,6 +81,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -858,6 +859,15 @@ DoCopyTo(CopyToState cstate)
 
 			CopySendEndOfRow(cstate);
 		}
+		/*
+		 * If JSON has been requested, and FORCE_ARRAY has been specified then
+		 * send the opening bracket.
+		*/
+		if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+		{
+			CopySendChar(cstate, '[');
+			CopySendEndOfRow(cstate);
+		}
 	}
 
 	if (cstate->rel)
@@ -905,6 +915,15 @@ DoCopyTo(CopyToState cstate)
 		CopySendEndOfRow(cstate);
 	}
 
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified then
+	 * send the closing bracket.
+	*/
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendEndOfRow(cstate);
+	}
 	MemoryContextDelete(cstate->rowcontext);
 
 	if (fe_copy)
@@ -1008,6 +1027,15 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 		result = makeStringInfo();
 		composite_to_json(rowdata, result, false);
 
+		if (cstate->json_row_delim_needed && cstate->opts.force_array)
+			CopySendChar(cstate, ',');
+		else if (cstate->opts.force_array)
+		{
+			/* first row needs no delimiter */
+			CopySendChar(cstate, ' ');
+			cstate->json_row_delim_needed = true;
+		}
+
 		CopySendData(cstate, result->data, result->len);
 	}
 
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 5fb5cb2daa..02d0d4e3cf 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3249,7 +3249,7 @@ match_previous_words(int pattern_id,
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
-					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+					  "FORCE_NOT_NULL", "FORCE_NULL", "FORCE_ARRAY", "ENCODING", "DEFAULT",
 					  "ON_ERROR", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 07fcc2bc9a..466ab76dac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -92,6 +92,7 @@ typedef struct CopyFormatOptions
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
+	bool		force_array;	/* add JSON array decorations */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
 	CopyLogVerbosityChoice log_verbosity;	/* verbosity of logged messages */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b4479bc72a..ffee7bff00 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -110,6 +110,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 054fce95cb..7a175ca76a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,15 @@ copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v14-0002-introduce-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v14-0002-introduce-json-format-for-COPY-TO.patchDownload

From e42e0696f6362fc8aba01ef1c546c4224aaf65c1 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 27 Jan 2025 15:05:48 +0800
Subject: [PATCH v14 2/3] introduce json format for COPY TO

json format is only allowed in COPY TO operation.
also cannot be used with {header, default, null, delimiter} options and many other options.
fully tested on src/test/regress/sql/copy.sql.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 13 ++++--
 src/backend/commands/copy.c        | 29 ++++++++++++
 src/backend/commands/copyto.c      | 51 ++++++++++++++++++--
 src/backend/parser/gram.y          |  8 ++++
 src/backend/utils/adt/json.c       |  5 +-
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/include/utils/json.h           |  2 +
 src/test/regress/expected/copy.out | 74 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 47 +++++++++++++++++++
 10 files changed, 219 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..5bf0f38d90 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -219,10 +219,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -257,7 +262,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +276,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -294,7 +299,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -310,7 +315,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       If this option is set to <literal>MATCH</literal>, the number and names
       of the columns in the header line must match the actual column names of
       the table, in order;  otherwise an error is raised.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
       The <literal>MATCH</literal> option is only valid for <command>COPY
       FROM</command> commands.
      </para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e7e7815853..c0234d22e5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -530,6 +530,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -695,16 +697,32 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->delim)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->null_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "NULL"));
+
 	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->default_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
@@ -775,6 +793,11 @@ ProcessCopyOptions(ParseState *pstate,
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->header_line)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("cannot specify %s in JSON mode", "HEADER"));
+
 	/* Check quote */
 	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
@@ -878,6 +901,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea90af28a9..d5d1bc4ec9 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -139,9 +141,20 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -921,7 +934,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (cstate->opts.format != COPY_FORMAT_BINARY)
+	if (cstate->opts.format == COPY_FORMAT_TEXT || cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		bool		need_delim = false;
 
@@ -949,7 +962,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
-	else
+	else if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		foreach_int(attnum, cstate->attnumlist)
 		{
@@ -969,6 +982,34 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			}
 		}
 	}
+	else
+	{
+		Datum		rowdata;
+		StringInfo	result;
+
+		/*
+		 * if COPY TO source data is from a query, not a plain table, then we
+		 * need copy CopyToState->TupleDesc to slot->tts_tupleDescriptor.
+		 * because the slot's TupleDesc->attrs may change during query
+		 * execution.
+		*/
+		if(!cstate->rel)
+		{
+			memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+				   TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+				   cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+			for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+				populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+			BlessTupleDesc(slot->tts_tupleDescriptor);
+		}
+		rowdata = ExecFetchSlotHeapTupleDatum(slot);
+		result = makeStringInfo();
+		composite_to_json(rowdata, result, false);
+
+		CopySendData(cstate, result->data, result->len);
+	}
 
 	CopySendEndOfRow(cstate);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d7f9c00c40..cfcb200b6e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3465,6 +3465,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3547,6 +3551,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+			{
+				$$ = makeDefElem("format", $2, @1);
+			}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 51452755f5..bf69347fa9 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 81cbf10aa2..5fb5cb2daa 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3254,7 +3254,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 7a1ee65601..07fcc2bc9a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
 } CopyFormat;
 
 /*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac0..1fa8e2ce8e 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index f554d42c84..b4479bc72a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,80 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR:  COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR:  COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+                                              ^
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index f1699b66b0..054fce95cb 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
-- 
2.34.1

v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patchtext/x-patch; charset=US-ASCII; name=v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patchDownload

From 7f475554a544b1e031522e21477731286deb9917 Mon Sep 17 00:00:00 2001
From: Joel Jacobson <joel@compiler.org>
Date: Thu, 24 Oct 2024 08:24:13 +0300
Subject: [PATCH v14 1/3] Introduce CopyFormat and replace csv_mode and binary
 fields with it.

---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      | 10 +++---
 src/backend/commands/copyfromparse.c | 34 +++++++++----------
 src/backend/commands/copyto.c        | 20 +++++------
 src/include/commands/copy.h          | 13 ++++++--
 src/tools/pgindent/typedefs.list     |  1 +
 6 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc2..e7e7815853 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -525,11 +525,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -689,31 +689,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -761,7 +761,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -769,43 +769,44 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote ||
+												opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -819,8 +820,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV &&
+		(opts_out->force_notnull != NIL || opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -835,8 +836,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -860,7 +861,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -896,7 +897,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -913,7 +914,8 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY &&
+		opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f560..aea807d13c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -122,7 +122,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
@@ -1583,7 +1583,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		/*
 		 * If encoding conversion is needed, we need another buffer to hold
@@ -1634,7 +1634,7 @@ BeginCopyFrom(ParseState *pstate,
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryInputInfo(att->atttypid,
 								   &in_func_oid, &typioparams[attnum - 1]);
 		else
@@ -1775,14 +1775,14 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
 
 	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563..5b56aa89e3 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -162,7 +162,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -748,7 +748,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format != COPY_FORMAT_BINARY);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
@@ -765,7 +765,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		{
 			int			fldnum;
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				fldct = CopyReadAttributesCSV(cstate);
 			else
 				fldct = CopyReadAttributesText(cstate);
@@ -820,7 +820,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		return false;
 
 	/* Parse the line into de-escaped field values */
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 		fldct = CopyReadAttributesCSV(cstate);
 	else
 		fldct = CopyReadAttributesText(cstate);
@@ -864,7 +864,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
 	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		char	  **field_strings;
 		ListCell   *cur;
@@ -905,7 +905,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 				continue;
 			}
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 			{
 				if (string == NULL &&
 					cstate->opts.force_notnull_flags[m])
@@ -1178,7 +1178,7 @@ CopyReadLineText(CopyFromState cstate)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
-	if (cstate->opts.csv_mode)
+	if (cstate->opts.format == COPY_FORMAT_CSV)
 	{
 		quotec = cstate->opts.quote[0];
 		escapec = cstate->opts.escape[0];
@@ -1255,7 +1255,7 @@ CopyReadLineText(CopyFromState cstate)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
-		if (cstate->opts.csv_mode)
+		if (cstate->opts.format == COPY_FORMAT_CSV)
 		{
 			/*
 			 * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1294,7 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \r */
-		if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\r' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1322,10 @@ CopyReadLineText(CopyFromState cstate)
 					if (cstate->eol_type == EOL_CRNL)
 						ereport(ERROR,
 								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errmsg("literal carriage return found in data") :
 								 errmsg("unquoted carriage return found in data"),
-								 !cstate->opts.csv_mode ?
+								 cstate->opts.format != COPY_FORMAT_CSV ?
 								 errhint("Use \"\\r\" to represent carriage return.") :
 								 errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1339,10 @@ CopyReadLineText(CopyFromState cstate)
 			else if (cstate->eol_type == EOL_NL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal carriage return found in data") :
 						 errmsg("unquoted carriage return found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\r\" to represent carriage return.") :
 						 errhint("Use quoted CSV field to represent carriage return.")));
 			/* If reach here, we have found the line terminator */
@@ -1350,15 +1350,15 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \n */
-		if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\n' && (cstate->opts.format != COPY_FORMAT_CSV || !in_quote))
 		{
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errmsg("literal newline found in data") :
 						 errmsg("unquoted newline found in data"),
-						 !cstate->opts.csv_mode ?
+						 cstate->opts.format != COPY_FORMAT_CSV ?
 						 errhint("Use \"\\n\" to represent newline.") :
 						 errhint("Use quoted CSV field to represent newline.")));
 			cstate->eol_type = EOL_NL;	/* in case not set yet */
@@ -1370,7 +1370,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * Process backslash, except in CSV mode where backslash is a normal
 		 * character.
 		 */
-		if (c == '\\' && !cstate->opts.csv_mode)
+		if (c == '\\' && cstate->opts.format != COPY_FORMAT_CSV)
 		{
 			char		c2;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb34..ea90af28a9 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -134,7 +134,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -191,7 +191,7 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 			{
 				/* Default line termination depends on platform */
 #ifndef WIN32
@@ -236,7 +236,7 @@ CopySendEndOfRow(CopyToState cstate)
 			break;
 		case COPY_FRONTEND:
 			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
+			if (cstate->opts.format != COPY_FORMAT_BINARY)
 				CopySendChar(cstate, '\n');
 
 			/* Dump the accumulated row as one CopyData message */
@@ -775,7 +775,7 @@ DoCopyTo(CopyToState cstate)
 		bool		isvarlena;
 		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-		if (cstate->opts.binary)
+		if (cstate->opts.format == COPY_FORMAT_BINARY)
 			getTypeBinaryOutputInfo(attr->atttypid,
 									&out_func_oid,
 									&isvarlena);
@@ -796,7 +796,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate header for a binary copy */
 		int32		tmp;
@@ -837,7 +837,7 @@ DoCopyTo(CopyToState cstate)
 
 				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, colname, false);
 				else
 					CopyAttributeOutText(cstate, colname);
@@ -884,7 +884,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Generate trailer for a binary copy */
 		CopySendInt16(cstate, -1);
@@ -912,7 +912,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* Binary per-tuple header */
 		CopySendInt16(cstate, list_length(cstate->attnumlist));
@@ -921,7 +921,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
+	if (cstate->opts.format != COPY_FORMAT_BINARY)
 	{
 		bool		need_delim = false;
 
@@ -941,7 +941,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			{
 				string = OutputFunctionCall(&out_functions[attnum - 1],
 											value);
-				if (cstate->opts.csv_mode)
+				if (cstate->opts.format == COPY_FORMAT_CSV)
 					CopyAttributeOutCSV(cstate, string,
 										cstate->opts.force_quote_flags[attnum - 1]);
 				else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef72..7a1ee65601 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +71,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a2644a2e65..d88382cdcc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -496,6 +496,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromState
 CopyFromStateData
-- 
2.34.1

#132

Junwang Zhao

zhjwpku@gmail.com

11 months ago

In reply to: jian he (#131)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Mon, Jan 27, 2025 at 4:17 PM jian he <jian.universality@gmail.com> wrote:

hi.

There are two ways we can use to represent the new copy format: json.
1.
typedef struct CopyFormatOptions
{
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
bool json_mode; /* JSON format? */
...
}

2.
typedef struct CopyFormatOptions
{
CopyFormat format; /* format of the COPY operation */
.....
}

typedef enum CopyFormat
{
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
COPY_FORMAT_JSON,
} CopyFormat;

both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184.
so the struct size will not influence the performance.

I also did some benchmarks when using CopyFormat.
the following are the benchmarks info:
-------------------------------------------------------------------------------------------------------
create unlogged table t as select g from generate_series(1, 1_000_000) g;

build_type=release patch:
copy t to '/dev/null' json \watch i=0.1 c=10
last execution Time: 108.741 ms

copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time: 42.600 ms

build_type=release master:
copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time Time: 42.948 ms

---------------------------------------------------------
so a new version is attached, using the struct CopyFormatOptions.

changes mainly in CopyOneRowTo.
now it is:
""""
if(!cstate->rel)
{
memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
TupleDescAttr(cstate->queryDesc->tupDesc, 0),
cstate->queryDesc->tupDesc->natts *
sizeof(FormData_pg_attribute));
for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
populate_compact_attribute(slot->tts_tupleDescriptor, i);
BlessTupleDesc(slot->tts_tupleDescriptor);
}
""""
reasoning for change:
for composite_to_json to construct json key, we only need
FormData_pg_attribute.attname
but code path
composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute
means we also need to call populate_compact_attribute to populate
other attributes.

v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch,
author is by Joel Jacobson.
As I mentioned in above,
replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions.
but consolidated 3 bool fields into one enum to make code more lean.
I think the refactoring (v14-0001) is worth it.

I've refactored the patch to adapt the newly introduced CopyToRoutine struct,
see 2e4127b6d2.

v15-0001 is the merged one of v14-0001 and v14-0002

There are some other ongoing *copy to/from* refactors[1]/messages/by-id/20250301.115009.424844407736647598.kou@clear-code.com which we can benefit
to make the code cleaner, especially the checks done in ProcessCopyOptions.

--
Regards
Junwang Zhao

Attachments:

v15-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patchapplication/octet-stream; name=v15-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From aee1dea1a0ab5d4d058a1cc8766667c78f77d175 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <zhjwpku@gmail.com>
Date: Sun, 2 Mar 2025 04:41:48 +0000
Subject: [PATCH v15 2/2] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.
it make the output json output behave like json array type.

refactored by Junwang Zhao to adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 +++++++++++
 src/backend/commands/copy.c        | 13 ++++++++++
 src/backend/commands/copyto.c      | 38 +++++++++++++++++++++++++++---
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 ++++++++++++++++++
 src/test/regress/sql/copy.sql      |  8 +++++++
 7 files changed, 95 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 9c519d8a9e2..7b3c913d4ee 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR <replaceable class="parameter">error_action</replaceable>
     REJECT_LIMIT <replaceable class="parameter">maxerror</replaceable>
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
@@ -392,6 +393,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6f74c798d0..7b4c64ea97e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -504,6 +504,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -658,6 +659,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -906,6 +914,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 69b7fd11e03..86842b764e8 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -128,6 +129,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
 static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -171,7 +173,7 @@ static const CopyToRoutine CopyToRoutineJson = {
 	.CopyToStart = CopyToTextLikeStart,
 	.CopyToOutFunc = CopyToTextLikeOutFunc,
 	.CopyToOneRow = CopyToJsonOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
+	.CopyToEnd = CopyToJsonEnd,
 };
 
 /* binary format */
@@ -235,6 +237,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 		CopySendTextLikeEndOfRow(cstate);
 	}
+
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send
+	 * the opening bracket.
+	 */
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, '[');
+		CopySendTextLikeEndOfRow(cstate);
+	}
 }
 
 /*
@@ -345,11 +357,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
 	result = makeStringInfo();
 	composite_to_json(rowdata, result, false);
 
+	if (cstate->json_row_delim_needed && cstate->opts.force_array)
+		CopySendChar(cstate, ',');
+	else if (cstate->opts.force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		cstate->json_row_delim_needed = true;
+	}
+
 	CopySendData(cstate, result->data, result->len);
 
 	CopySendTextLikeEndOfRow(cstate);
 }
 
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+	if (cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendTextLikeEndOfRow(cstate);
+	}
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
@@ -554,8 +586,8 @@ CopySendEndOfRow(CopyToState cstate)
 }
 
 /*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
- * line termination and do common appropriate things for the end of row.
+ * Wrapper function of CopySendEndOfRow for text, CSV and json formats. Sends
+ * the line termination and do common appropriate things for the end of row.
  */
 static inline void
 CopySendTextLikeEndOfRow(CopyToState cstate)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6069b118834..e0c7412ec0a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3278,7 +3278,7 @@ match_previous_words(int pattern_id,
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
-					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+					  "FORCE_NOT_NULL", "FORCE_NULL", "FORCE_ARRAY", "ENCODING", "DEFAULT",
 					  "ON_ERROR", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 07fcc2bc9ac..fa8a8ab7e31 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -89,6 +89,7 @@ typedef struct CopyFormatOptions
 	List	   *force_notnull;	/* list of column names */
 	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
 	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	bool		force_array;	/* add JSON array decorations */
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index d9e30937feb..7aa0005fe7a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -110,6 +110,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 5e4d1f0781a..3e062275d85 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.39.5

v15-0001-Introduce-json-format-for-COPY-TO.patchapplication/octet-stream; name=v15-0001-Introduce-json-format-for-COPY-TO.patchDownload

From 462dd01a2865a3c3b9a417ca15dbcb7d67061838 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <zhjwpku@gmail.com>
Date: Sun, 2 Mar 2025 04:07:29 +0000
Subject: [PATCH v15 1/2] Introduce json format for COPY TO

json format is only allowed in COPY TO operation.
also cannot be used with {header, default, null, delimiter} options and many other options.
fully tested on src/test/regress/sql/copy.sql.

refactored by Jian He to introduce a CopyFormat enum which replaces csv_mode and binary.
refactored by Junwang Zhao to adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml           | 13 +++--
 src/backend/commands/copy.c          | 77 +++++++++++++++++++---------
 src/backend/commands/copyfrom.c      |  6 +--
 src/backend/commands/copyfromparse.c |  6 +--
 src/backend/commands/copyto.c        | 73 ++++++++++++++++++++++----
 src/backend/parser/gram.y            |  8 +++
 src/backend/utils/adt/json.c         |  5 +-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          | 14 ++++-
 src/include/utils/json.h             |  2 +
 src/test/regress/expected/copy.out   | 74 ++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql        | 47 +++++++++++++++++
 src/tools/pgindent/typedefs.list     |  1 +
 13 files changed, 279 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..9c519d8a9e2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -219,10 +219,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -257,7 +262,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +276,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -294,7 +299,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -310,7 +315,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       If this option is set to <literal>MATCH</literal>, the number and names
       of the columns in the header line must match the actual column names of
       the table, in order;  otherwise an error is raised.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
       The <literal>MATCH</literal> option is only valid for <command>COPY
       FROM</command> commands.
      </para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..b6f74c798d0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -525,11 +525,13 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -689,31 +691,47 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->delim)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->null_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "NULL"));
+
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->default_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -761,7 +779,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -769,43 +787,48 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->header_line)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("cannot specify %s in JSON mode", "HEADER"));
+
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -819,8 +842,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+												opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -835,8 +858,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -860,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -877,6 +900,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
@@ -896,7 +925,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -913,7 +942,7 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 198cee2bc48..1fa6d6130d9 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(CopyFormatOptions opts)
 {
-	if (opts.csv_mode)
+	if (opts.format == COPY_FORMAT_CSV)
 		return &CopyFromRoutineCSV;
-	else if (opts.binary)
+	else if (opts.format == COPY_FORMAT_BINARY)
 		return &CopyFromRoutineBinary;
 
 	/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..02263f1b1f5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 {
 	return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
-										 cstate->opts.csv_mode);
+										 cstate->opts.format == COPY_FORMAT_CSV);
 }
 
 /*
@@ -774,7 +774,7 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(!(cstate->opts.format == COPY_FORMAT_BINARY));
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 721d29f8e53..69b7fd11e03 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -164,6 +167,13 @@ static const CopyToRoutine CopyToRoutineCSV = {
 	.CopyToEnd = CopyToTextLikeEnd,
 };
 
+static const CopyToRoutine CopyToRoutineJson = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToJsonOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
 	.CopyToStart = CopyToBinaryStart,
@@ -176,16 +186,18 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(CopyFormatOptions opts)
 {
-	if (opts.csv_mode)
+	if (opts.format == COPY_FORMAT_CSV)
 		return &CopyToRoutineCSV;
-	else if (opts.binary)
+	else if (opts.format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
+	else if (opts.format == COPY_FORMAT_JSON)
+		return &CopyToRoutineJson;
 
 	/* default is text */
 	return &CopyToRoutineText;
 }
 
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV and json formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
@@ -215,7 +227,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				CopyAttributeOutCSV(cstate, colname, false);
 			else
 				CopyAttributeOutText(cstate, colname);
@@ -226,7 +238,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 }
 
 /*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, json formats. Assign
  * the output function data to the given *finfo.
  */
 static void
@@ -306,6 +318,38 @@ CopyToTextLikeEnd(CopyToState cstate)
 	/* Nothing to do here */
 }
 
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	Datum		rowdata;
+	StringInfo	result;
+
+	/*
+	 * if COPY TO source data is from a query, not a plain table, then we need
+	 * copy CopyToState->TupleDesc to slot->tts_tupleDescriptor. because the
+	 * slot's TupleDesc->attrs may change during query execution.
+	 */
+	if (!cstate->rel)
+	{
+		memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+			   TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+			   cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+		for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+			populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+		BlessTupleDesc(slot->tts_tupleDescriptor);
+	}
+	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+	result = makeStringInfo();
+	composite_to_json(rowdata, result, false);
+
+	CopySendData(cstate, result->data, result->len);
+
+	CopySendTextLikeEndOfRow(cstate);
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
@@ -392,14 +436,25 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d99c9355c6..9fe151bdd7c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3465,6 +3465,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3547,6 +3551,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+				{
+					$$ = makeDefElem("format", $2, @1);
+				}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 51452755f58..bf69347fa94 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8432be641ac..6069b118834 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3283,7 +3283,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..07fcc2bc9ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,17 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +72,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index e69e34c69b8..d9e30937feb 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,80 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR:  COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR:  COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+                                              ^
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 895479d2d0f..5e4d1f0781a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 56989aa0b84..71f64d26f09 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -500,6 +500,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromRoutine
 CopyFromState
-- 
2.39.5

#133

[1]: https://commitfest.postgresql.org/patch/4716/

jian.universality@gmail.com

10 months ago

In reply to: Junwang Zhao (#132)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Sun, Mar 2, 2025 at 1:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

I've refactored the patch to adapt the newly introduced CopyToRoutine struct,
see 2e4127b6d2.

v15-0001 is the merged one of v14-0001 and v14-0002

There are some other ongoing *copy to/from* refactors[1] which we can benefit
to make the code cleaner, especially the checks done in ProcessCopyOptions.

[1]: /messages/by-id/20250301.115009.424844407736647598.kou@clear-code.com

hi.

git apply --check $PATCHES/v15-0001-Introduce-json-format-for-COPY-TO.patch
error: patch failed: src/backend/commands/copyfrom.c:155
error: src/backend/commands/copyfrom.c: patch does not apply
error: patch failed: src/backend/commands/copyto.c:176
error: src/backend/commands/copyto.c: patch does not apply

seems to need rebase.
the attachment is the rebase, minor comments tweaks, and commit message tweaks.

another issue is this patch entry in commitfest [1]https://commitfest.postgresql.org/patch/4716/ status is: Not processed,
which means no cfbots CI tests, seems not great.
not sure how to resolve this issue....

Attachments:

v16-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patchtext/x-patch; charset=US-ASCII; name=v16-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From 24e1858722dbb25c4842d0ec2dee5b1047edcc23 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 11 Mar 2025 16:19:55 +0800
Subject: [PATCH v16 2/2] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.
it make the output json output behave like json array type.

refactored by Junwang Zhao to adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 ++++++++++++
 src/backend/commands/copy.c        | 13 ++++++++++++
 src/backend/commands/copyto.c      | 34 +++++++++++++++++++++++++++++-
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 ++++++++++++++++++++
 src/test/regress/sql/copy.sql      |  8 +++++++
 7 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 9c519d8a9e2..7b3c913d4ee 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     ON_ERROR <replaceable class="parameter">error_action</replaceable>
     REJECT_LIMIT <replaceable class="parameter">maxerror</replaceable>
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
@@ -392,6 +393,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>JSON</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ON_ERROR</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6f74c798d0..7b4c64ea97e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -504,6 +504,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -658,6 +659,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -906,6 +914,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c1d4cbeedea..393c0440ad7 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -128,6 +129,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
 static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +174,7 @@ static const CopyToRoutine CopyToRoutineJson = {
 	.CopyToStart = CopyToTextLikeStart,
 	.CopyToOutFunc = CopyToTextLikeOutFunc,
 	.CopyToOneRow = CopyToJsonOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
+	.CopyToEnd = CopyToJsonEnd,
 };
 
 /* binary format */
@@ -238,6 +240,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 		CopySendTextLikeEndOfRow(cstate);
 	}
+
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send
+	 * the opening bracket.
+	 */
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, '[');
+		CopySendTextLikeEndOfRow(cstate);
+	}
 }
 
 /*
@@ -349,11 +361,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
 	result = makeStringInfo();
 	composite_to_json(rowdata, result, false);
 
+	if (cstate->json_row_delim_needed && cstate->opts.force_array)
+		CopySendChar(cstate, ',');
+	else if (cstate->opts.force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		cstate->json_row_delim_needed = true;
+	}
+
 	CopySendData(cstate, result->data, result->len);
 
 	CopySendTextLikeEndOfRow(cstate);
 }
 
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+	if (cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendTextLikeEndOfRow(cstate);
+	}
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6069b118834..e0c7412ec0a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3278,7 +3278,7 @@ match_previous_words(int pattern_id,
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
-					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+					  "FORCE_NOT_NULL", "FORCE_NULL", "FORCE_ARRAY", "ENCODING", "DEFAULT",
 					  "ON_ERROR", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 07fcc2bc9ac..fa8a8ab7e31 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -89,6 +89,7 @@ typedef struct CopyFormatOptions
 	List	   *force_notnull;	/* list of column names */
 	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
 	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	bool		force_array;	/* add JSON array decorations */
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0d4cfc0b60a..3d0781700b3 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -110,6 +110,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 6ee96f5aa51..2781c24bd84 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v16-0001-Introduce-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v16-0001-Introduce-json-format-for-COPY-TO.patchDownload

From 71cf17c9d1aefecc89cc388b979bbfc9952898c8 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 11 Mar 2025 15:53:37 +0800
Subject: [PATCH v16 1/2] Introduce json format for COPY TO

json format is only allowed in COPY TO operation.
also cannot be used with {header, default, null, delimiter} options and many other options.
fully tested on src/test/regress/sql/copy.sql.

CopyFormat enum part was coied from Joel Jacobson <joel@compiler.org>
refactored by Jian He to fix some miscellaneous issue.
refactored by Junwang Zhao to adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
Reviewed-by: "Andrey M. Borodin" <x4mmm@yandex-team.ru>,
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>,
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>,
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>,
Reviewed-by: Davin Shearer <davin@apache.org>,
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>,
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml           | 13 +++--
 src/backend/commands/copy.c          | 77 ++++++++++++++++++--------
 src/backend/commands/copyfrom.c      |  6 +-
 src/backend/commands/copyfromparse.c |  6 +-
 src/backend/commands/copyto.c        | 83 ++++++++++++++++++++++++----
 src/backend/parser/gram.y            |  8 +++
 src/backend/utils/adt/json.c         |  5 +-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          | 14 ++++-
 src/include/utils/json.h             |  2 +
 src/test/regress/expected/copy.out   | 74 +++++++++++++++++++++++++
 src/test/regress/sql/copy.sql        | 47 ++++++++++++++++
 src/tools/pgindent/typedefs.list     |  1 +
 13 files changed, 286 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..9c519d8a9e2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -219,10 +219,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -257,7 +262,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -271,7 +276,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -294,7 +299,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -310,7 +315,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       If this option is set to <literal>MATCH</literal>, the number and names
       of the columns in the header line must match the actual column names of
       the table, in order;  otherwise an error is raised.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
       The <literal>MATCH</literal> option is only valid for <command>COPY
       FROM</command> commands.
      </para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..b6f74c798d0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -525,11 +525,13 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -689,31 +691,47 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->delim)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->null_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "NULL"));
+
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->default_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -761,7 +779,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -769,43 +787,48 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->header_line)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("cannot specify %s in JSON mode", "HEADER"));
+
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -819,8 +842,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+												opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -835,8 +858,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -860,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -877,6 +900,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
@@ -896,7 +925,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -913,7 +942,7 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..bfe1937539b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyFromRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyFromRoutineBinary;
 
 	/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..02263f1b1f5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 {
 	return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
-										 cstate->opts.csv_mode);
+										 cstate->opts.format == COPY_FORMAT_CSV);
 }
 
 /*
@@ -774,7 +774,7 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
 	bool		done;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(!(cstate->opts.format == COPY_FORMAT_BINARY));
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..c1d4cbeedea 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 /*
  * COPY TO routines for built-in formats.
  *
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
  * one-row callback.
  */
 
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
 	.CopyToEnd = CopyToTextLikeEnd,
 };
 
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToJsonOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
 	.CopyToStart = CopyToBinaryStart,
@@ -176,16 +187,18 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyToRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
+	else if (opts->format == COPY_FORMAT_JSON)
+		return &CopyToRoutineJson;
 
 	/* default is text */
 	return &CopyToRoutineText;
 }
 
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 		ListCell   *cur;
 		bool		hdr_delim = false;
 
+		Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
 		foreach(cur, cstate->attnumlist)
 		{
 			int			attnum = lfirst_int(cur);
@@ -215,7 +230,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				CopyAttributeOutCSV(cstate, colname, false);
 			else
 				CopyAttributeOutText(cstate, colname);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 }
 
 /*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
  * the output function data to the given *finfo.
  */
 static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
 	CopySendTextLikeEndOfRow(cstate);
 }
 
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
 static void
 CopyToTextLikeEnd(CopyToState cstate)
 {
 	/* Nothing to do here */
 }
 
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	Datum		rowdata;
+	StringInfo	result;
+
+	/*
+	 * if the COPY TO source data come from query rather than plain table, we need
+	 * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+	 * This is necessary because the slot's TupleDesc may change during query execution,
+	 * and we depend on it when calling composite_to_json.
+	 */
+	if (!cstate->rel)
+	{
+		memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+			   TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+			   cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+		for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+			populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+		BlessTupleDesc(slot->tts_tupleDescriptor);
+	}
+	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+	result = makeStringInfo();
+	composite_to_json(rowdata, result, false);
+
+	CopySendData(cstate, result->data, result->len);
+
+	CopySendTextLikeEndOfRow(cstate);
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
@@ -392,14 +440,25 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -499,7 +558,7 @@ CopySendEndOfRow(CopyToState cstate)
 }
 
 /*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
  * line termination and do common appropriate things for the end of row.
  */
 static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 271ae26cbaf..e26881bb13f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3493,6 +3493,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3575,6 +3579,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+				{
+					$$ = makeDefElem("format", $2, @1);
+				}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 51452755f58..bf69347fa94 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8432be641ac..6069b118834 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3283,7 +3283,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..07fcc2bc9ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,17 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +72,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	CopyHeaderChoice header_line;	/* header line? */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
 	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 06bae8c61ae..0d4cfc0b60a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,80 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR:  COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR:  COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+                                              ^
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 3009bdfdf89..6ee96f5aa51 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9840060997f..a2e4bfee0e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -500,6 +500,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromRoutine
 CopyFromState
-- 
2.34.1

#134

jian.universality@gmail.com

6 months ago

In reply to: jian he (#133)

2 attachment(s)

Re: Emitting JSON to file using COPY TO

On Tue, Mar 11, 2025 at 4:23 PM jian he <jian.universality@gmail.com> wrote:

hi.
rebase and minor tweaks.

Attachments:

v17-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patchtext/x-patch; charset=US-ASCII; name=v17-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From 44c494fd8d7fdb9d8fd5d2d2a48f49b779d1bcb9 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 4 Jul 2025 13:25:00 +0800
Subject: [PATCH v17 2/2] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.  it make the
output json output behave like json array type.  refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 +++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++
 src/backend/commands/copyto.c      | 37 +++++++++++++++++++++++++++++-
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++
 src/test/regress/sql/copy.sql      |  8 +++++++
 7 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 219604ad306..c01927864bd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>json</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>FORCE_QUOTE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 213e59cc435..d23ef99e395 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -514,6 +514,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -670,6 +671,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -918,6 +926,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 34b72936bca..18061661fb1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,10 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+
+	/* need delimiter to start next json array element */
+	bool		json_row_delim_needed;
+
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -128,6 +132,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
 static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +177,7 @@ static const CopyToRoutine CopyToRoutineJson = {
 	.CopyToStart = CopyToTextLikeStart,
 	.CopyToOutFunc = CopyToTextLikeOutFunc,
 	.CopyToOneRow = CopyToJsonOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
+	.CopyToEnd = CopyToJsonEnd,
 };
 
 /* binary format */
@@ -238,6 +243,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 		CopySendTextLikeEndOfRow(cstate);
 	}
+
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * opening bracket.
+	 */
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, '[');
+		CopySendTextLikeEndOfRow(cstate);
+	}
 }
 
 /*
@@ -349,11 +364,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
 	result = makeStringInfo();
 	composite_to_json(rowdata, result, false);
 
+	if (cstate->json_row_delim_needed && cstate->opts.force_array)
+		CopySendChar(cstate, ',');
+	else if (cstate->opts.force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		cstate->json_row_delim_needed = true;
+	}
+
 	CopySendData(cstate, result->data, result->len);
 
 	CopySendTextLikeEndOfRow(cstate);
 }
 
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+	if (cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendTextLikeEndOfRow(cstate);
+	}
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index bd4c03be050..55313612398 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3303,7 +3303,7 @@ match_previous_words(int pattern_id,
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
-					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+					  "FORCE_NOT_NULL", "FORCE_NULL", "FORCE_ARRAY", "ENCODING", "DEFAULT",
 					  "ON_ERROR", "LOG_VERBOSITY", "REJECT_LIMIT");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
 	List	   *force_notnull;	/* list of column names */
 	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
 	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	bool		force_array;	/* add JSON array decorations */
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index fcb8823e101..f3196fd5609 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -110,6 +110,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 80bd4a59239..c55aa08e99d 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -100,6 +100,14 @@ copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v17-0001-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v17-0001-json-format-for-COPY-TO.patchDownload

From 984846c42dee4da6bd718cc24031bc0272d62f12 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 4 Jul 2025 13:23:30 +0800
Subject: [PATCH v17 1/2] json format for COPY TO

JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql

The CopyFormat enum was originally contributed by Joel Jacobson
joel@compiler.org, later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
Reviewed-by: "Andrey M. Borodin" <x4mmm@yandex-team.ru>,
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>,
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>,
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>,
Reviewed-by: Davin Shearer <davin@apache.org>,
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>,
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml           | 13 +++--
 src/backend/commands/copy.c          | 79 ++++++++++++++++++--------
 src/backend/commands/copyfrom.c      |  6 +-
 src/backend/commands/copyfromparse.c |  7 ++-
 src/backend/commands/copyto.c        | 83 ++++++++++++++++++++++++----
 src/backend/parser/gram.y            |  8 +++
 src/backend/utils/adt/json.c         |  5 +-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          | 14 ++++-
 src/include/utils/json.h             |  2 +
 src/test/regress/expected/copy.out   | 74 +++++++++++++++++++++++++
 src/test/regress/sql/copy.sql        | 46 +++++++++++++++
 src/tools/pgindent/typedefs.list     |  1 +
 13 files changed, 288 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..219604ad306 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       <command>COPY FROM</command> commands.
      </para>
      <para>
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..213e59cc435 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -521,6 +521,8 @@ ProcessCopyOptions(ParseState *pstate,
 		opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
 
 	opts_out->file_encoding = -1;
+	/* default format */
+	opts_out->format = COPY_FORMAT_TEXT;
 
 	/* Extract options from the statement node tree */
 	foreach(option, options)
@@ -535,11 +537,13 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -699,31 +703,47 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->delim)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+				errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->null_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "NULL"));
+
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->default_print)
+		ereport(ERROR,
+				errcode(ERRCODE_SYNTAX_ERROR),
+				errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -771,7 +791,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -779,43 +799,48 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
+	if (opts_out->format == COPY_FORMAT_JSON && opts_out->header_line)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("cannot specify %s in JSON mode", "HEADER"));
+
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -829,8 +854,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+												opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -845,8 +870,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -870,7 +895,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -887,6 +912,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format  */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
@@ -906,7 +937,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -923,7 +954,7 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..6c4bd303841 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyFromRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyFromRoutineBinary;
 
 	/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 {
 	return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
-										 cstate->opts.csv_mode);
+										 cstate->opts.format == COPY_FORMAT_CSV);
 }
 
 /*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
 	bool		done = false;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+		   cstate->opts.format == COPY_FORMAT_CSV);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..34b72936bca 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 /*
  * COPY TO routines for built-in formats.
  *
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
  * one-row callback.
  */
 
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
 	.CopyToEnd = CopyToTextLikeEnd,
 };
 
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToJsonOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
 	.CopyToStart = CopyToBinaryStart,
@@ -176,16 +187,18 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyToRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
+	else if (opts->format == COPY_FORMAT_JSON)
+		return &CopyToRoutineJson;
 
 	/* default is text */
 	return &CopyToRoutineText;
 }
 
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 		ListCell   *cur;
 		bool		hdr_delim = false;
 
+		Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
 		foreach(cur, cstate->attnumlist)
 		{
 			int			attnum = lfirst_int(cur);
@@ -215,7 +230,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				CopyAttributeOutCSV(cstate, colname, false);
 			else
 				CopyAttributeOutText(cstate, colname);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 }
 
 /*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
  * the output function data to the given *finfo.
  */
 static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
 	CopySendTextLikeEndOfRow(cstate);
 }
 
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
 static void
 CopyToTextLikeEnd(CopyToState cstate)
 {
 	/* Nothing to do here */
 }
 
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	Datum		rowdata;
+	StringInfo	result;
+
+	/*
+	 * if the COPY TO source data come from query rather than plain table, we
+	 * need copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+	 * This is necessary because the slot's TupleDesc may change during query
+	 * execution, and we depend on it when calling composite_to_json.
+	 */
+	if (!cstate->rel)
+	{
+		memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+			   TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+			   cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+		for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+			populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+		BlessTupleDesc(slot->tts_tupleDescriptor);
+	}
+	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+	result = makeStringInfo();
+	composite_to_json(rowdata, result, false);
+
+	CopySendData(cstate, result->data, result->len);
+
+	CopySendTextLikeEndOfRow(cstate);
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
@@ -392,14 +440,25 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -499,7 +558,7 @@ CopySendEndOfRow(CopyToState cstate)
 }
 
 /*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
  * line termination and do common appropriate things for the end of row.
  */
 static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 70a0d832a11..de0f5eb4118 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3492,6 +3492,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3574,6 +3578,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+				{
+					$$ = makeDefElem("format", $2, @1);
+				}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 51452755f58..bf69347fa94 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 53e7d35fe98..bd4c03be050 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3308,7 +3308,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,17 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +69,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	int			header_line;	/* number of lines to skip or COPY_HEADER_XXX
 								 * value (see the above) */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..fcb8823e101 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,80 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR:  COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR:  COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+                                              ^
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..80bd4a59239 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,52 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 114bdafafdf..fa7d7b9244a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -518,6 +518,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromRoutine
 CopyFromState
-- 
2.34.1

#135

jian.universality@gmail.com

5 months ago

In reply to: jian he (#134)

3 attachment(s)

Re: Emitting JSON to file using COPY TO

hi.

rebase and splitted into 3 patches.

v18-0001
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
remove CopyFormatOptions two boolean field
(binary, csv_mode)

v18-0002, v18-0003 is refactoring based on prior patch.

Attachments:

v18-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchtext/x-patch; charset=US-ASCII; name=v18-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From e9104bcc0a6c4ca96df5ff3fdd3ae659885dd664 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 30 Jul 2025 19:50:41 +0800
Subject: [PATCH v18 3/3] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.  it make the
output json output behave like json array type.  refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 +++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++
 src/backend/commands/copyto.c      | 37 +++++++++++++++++++++++++++++-
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++
 src/test/regress/sql/copy.sql      |  8 +++++++
 7 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 219604ad306..c01927864bd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>json</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>FORCE_QUOTE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7fd41dba250..0a22272f3fe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -514,6 +514,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -670,6 +671,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -925,6 +933,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 13eb14debbd..6fc1d3e9fee 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,10 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+
+	/* need delimiter to start next json array element */
+	bool		json_row_delim_needed;
+
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -128,6 +132,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
 static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +177,7 @@ static const CopyToRoutine CopyToRoutineJson = {
 	.CopyToStart = CopyToTextLikeStart,
 	.CopyToOutFunc = CopyToTextLikeOutFunc,
 	.CopyToOneRow = CopyToJsonOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
+	.CopyToEnd = CopyToJsonEnd,
 };
 
 /* binary format */
@@ -238,6 +243,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 		CopySendTextLikeEndOfRow(cstate);
 	}
+
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * opening bracket.
+	 */
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, '[');
+		CopySendTextLikeEndOfRow(cstate);
+	}
 }
 
 /*
@@ -349,11 +364,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
 	result = makeStringInfo();
 	composite_to_json(rowdata, result, false);
 
+	if (cstate->json_row_delim_needed && cstate->opts.force_array)
+		CopySendChar(cstate, ',');
+	else if (cstate->opts.force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		cstate->json_row_delim_needed = true;
+	}
+
 	CopySendData(cstate, result->data, result->len);
 
 	CopySendTextLikeEndOfRow(cstate);
 }
 
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+	if (cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendTextLikeEndOfRow(cstate);
+	}
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 16db5373c5f..9c65376fc7e 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1212,7 +1212,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
 
 /* COPY TO options */
 #define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
 
 /*
  * These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
 	List	   *force_notnull;	/* list of column names */
 	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
 	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	bool		force_array;	/* add JSON array decorations */
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0fc6e84352c..22626a13ba5 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -112,6 +112,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 071986d427a..0f121b48f71 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v18-0002-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v18-0002-json-format-for-COPY-TO.patchDownload

From e6bafe5eab9463b253e9697e0fbd82246c316a12 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sun, 10 Aug 2025 23:13:38 +0800
Subject: [PATCH v18 2/3] json format for COPY TO

JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql

The CopyFormat enum was originally contributed by Joel Jacobson
joel@compiler.org, later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
Reviewed-by: "Andrey M. Borodin" <x4mmm@yandex-team.ru>,
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>,
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>,
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>,
Reviewed-by: Davin Shearer <davin@apache.org>,
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>,
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 13 +++--
 src/backend/commands/copy.c        | 72 +++++++++++++++++++++-------
 src/backend/commands/copyto.c      | 76 ++++++++++++++++++++++++++----
 src/backend/parser/gram.y          |  8 ++++
 src/backend/utils/adt/json.c       |  5 +-
 src/bin/psql/tab-complete.in.c     |  4 +-
 src/include/commands/copy.h        |  1 +
 src/include/utils/json.h           |  2 +
 src/test/regress/expected/copy.out | 76 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 47 ++++++++++++++++++
 src/tools/pgindent/typedefs.list   |  1 +
 11 files changed, 271 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..219604ad306 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       <command>COPY FROM</command> commands.
      </para>
      <para>
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 68f69cfb9df..7fd41dba250 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -542,6 +542,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -701,21 +703,42 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+	if (opts_out->delim)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+	}
 
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "NULL")));
+	if (opts_out->null_print)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in BINARY mode", "NULL"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "NULL"));
+	}
 
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+	if (opts_out->default_print)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in BINARY mode", "DEFAULT"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+	}
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
@@ -781,11 +804,18 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
+	if (opts_out->header_line != COPY_HEADER_FALSE)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					errmsg("cannot specify %s in BINARY mode", "HEADER"));
+		else if(opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot specify %s in JSON mode", "HEADER"));
+	}
 
 	/* Check quote */
 	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -889,6 +919,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e990343bab0..13eb14debbd 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 /*
  * COPY TO routines for built-in formats.
  *
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
  * one-row callback.
  */
 
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
 	.CopyToEnd = CopyToTextLikeEnd,
 };
 
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToJsonOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
 	.CopyToStart = CopyToBinaryStart,
@@ -180,12 +191,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
 		return &CopyToRoutineCSV;
 	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
+	else if (opts->format == COPY_FORMAT_JSON)
+		return &CopyToRoutineJson;
 
 	/* default is text */
 	return &CopyToRoutineText;
 }
 
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 		ListCell   *cur;
 		bool		hdr_delim = false;
 
+		Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
 		foreach(cur, cstate->attnumlist)
 		{
 			int			attnum = lfirst_int(cur);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 }
 
 /*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
  * the output function data to the given *finfo.
  */
 static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
 	CopySendTextLikeEndOfRow(cstate);
 }
 
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
 static void
 CopyToTextLikeEnd(CopyToState cstate)
 {
 	/* Nothing to do here */
 }
 
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	Datum		rowdata;
+	StringInfo	result;
+
+	/*
+	 * If COPY TO source data come from query rather than plain table, we need
+	 * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+	 * This is necessary because the slot's TupleDesc may change during query
+	 * execution, and we depend on it when calling composite_to_json.
+	 */
+	if (!cstate->rel)
+	{
+		memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+			   TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+			   cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+		for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+			populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+		BlessTupleDesc(slot->tts_tupleDescriptor);
+	}
+	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+	result = makeStringInfo();
+	composite_to_json(rowdata, result, false);
+
+	CopySendData(cstate, result->data, result->len);
+
+	CopySendTextLikeEndOfRow(cstate);
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
@@ -397,9 +445,21 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
+
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -499,7 +559,7 @@ CopySendEndOfRow(CopyToState cstate)
 }
 
 /*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
  * line termination and do common appropriate things for the end of row.
  */
 static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..48e2242327e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3528,6 +3528,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3610,6 +3614,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+				{
+					$$ = makeDefElem("format", $2, @1);
+				}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index e9d370cb3da..e517470bbc7 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1f2ca946fc5..16db5373c5f 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3344,8 +3344,10 @@ match_previous_words(int pattern_id,
 		COMPLETE_WITH(Copy_to_options);
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
-	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
+	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "FORMAT"))
 		COMPLETE_WITH("binary", "csv", "text");
+	else if (Matches("COPY|\\copy", MatchAny, "TO", MatchAny, "WITH", "(", "FORMAT"))
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 686653233b2..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
 } CopyFormat;
 
 /*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..0fc6e84352c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,82 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR:  COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR:  COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+                                              ^
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..071986d427a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..374b40d14de 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -515,6 +515,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromRoutine
 CopyFromState
-- 
2.34.1

v18-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patchtext/x-patch; charset=US-ASCII; name=v18-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patchDownload

From 1b30cb9b34b770cebc0bb20af867baec2f72aedb Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 30 Jul 2025 16:48:56 +0800
Subject: [PATCH v18 1/3] introduce CopyFormat refactor CopyFormatOptions

Currently, COPY command format is determined by two booleans, binary and
csv_mode, within CopyFormatOptions. This approach, while functional, isn't ideal
for future expansion.

To simplify adding new formats, we've introduced an enum CopyFormat.  This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.

The CopyFormat enum was originally contributed by Joel Jacobson
joel@compiler.org, later refactored by Jian He to address various issues.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      |  6 ++--
 src/backend/commands/copyfromparse.c |  7 ++--
 src/backend/commands/copyto.c        |  8 ++---
 src/include/commands/copy.h          | 13 ++++++--
 5 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..68f69cfb9df 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -521,6 +521,8 @@ ProcessCopyOptions(ParseState *pstate,
 		opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
 
 	opts_out->file_encoding = -1;
+	/* default format */
+	opts_out->format = COPY_FORMAT_TEXT;
 
 	/* Extract options from the statement node tree */
 	foreach(option, options)
@@ -535,11 +537,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -699,31 +701,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -771,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -779,43 +781,43 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -829,8 +831,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+												opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -845,8 +847,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -870,7 +872,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -906,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -923,7 +925,7 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..6c4bd303841 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyFromRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyFromRoutineBinary;
 
 	/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 {
 	return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
-										 cstate->opts.csv_mode);
+										 cstate->opts.format == COPY_FORMAT_CSV);
 }
 
 /*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
 	bool		done = false;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+		   cstate->opts.format == COPY_FORMAT_CSV);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..e990343bab0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -176,9 +176,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyToRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
 
 	/* default is text */
@@ -215,7 +215,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				CopyAttributeOutCSV(cstate, colname, false);
 			else
 				CopyAttributeOutText(cstate, colname);
@@ -392,7 +392,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..686653233b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +68,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	int			header_line;	/* number of lines to skip or COPY_HEADER_XXX
 								 * value (see the above) */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
-- 
2.34.1

#136

jian.universality@gmail.com

3 months ago

In reply to: jian he (#135)

3 attachment(s)

Re: Emitting JSON to file using COPY TO

On Sun, Aug 10, 2025 at 11:20 PM jian he <jian.universality@gmail.com> wrote:

v18-0001
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
remove CopyFormatOptions two boolean field
(binary, csv_mode)

v18-0002, v18-0003 is refactoring based on prior patch.

hi.
v19 attached, same as v18.
repost it so that CFbot can pick up the latest patchset.

Attachments:

v19-0002-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v19-0002-json-format-for-COPY-TO.patchDownload

From 3869884bd47aaf681cc3c2bf96fe59d11069155c Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sun, 10 Aug 2025 23:13:38 +0800
Subject: [PATCH v19 2/3] json format for COPY TO

JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql

The CopyFormat enum was originally contributed by Joel Jacobson
joel@compiler.org, later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
Reviewed-by: "Andrey M. Borodin" <x4mmm@yandex-team.ru>,
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>,
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>,
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>,
Reviewed-by: Davin Shearer <davin@apache.org>,
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>,
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 13 +++--
 src/backend/commands/copy.c        | 72 +++++++++++++++++++++-------
 src/backend/commands/copyto.c      | 76 ++++++++++++++++++++++++++----
 src/backend/parser/gram.y          |  8 ++++
 src/backend/utils/adt/json.c       |  5 +-
 src/bin/psql/tab-complete.in.c     |  4 +-
 src/include/commands/copy.h        |  1 +
 src/include/utils/json.h           |  2 +
 src/test/regress/expected/copy.out | 76 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 47 ++++++++++++++++++
 src/tools/pgindent/typedefs.list   |  1 +
 11 files changed, 271 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..219604ad306 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       <command>COPY FROM</command> commands.
      </para>
      <para>
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 68f69cfb9df..7fd41dba250 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -542,6 +542,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -701,21 +703,42 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+	if (opts_out->delim)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+	}
 
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "NULL")));
+	if (opts_out->null_print)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in BINARY mode", "NULL"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "NULL"));
+	}
 
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+	if (opts_out->default_print)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in BINARY mode", "DEFAULT"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+	}
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
@@ -781,11 +804,18 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
+	if (opts_out->header_line != COPY_HEADER_FALSE)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					errmsg("cannot specify %s in BINARY mode", "HEADER"));
+		else if(opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot specify %s in JSON mode", "HEADER"));
+	}
 
 	/* Check quote */
 	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -889,6 +919,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e990343bab0..13eb14debbd 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 /*
  * COPY TO routines for built-in formats.
  *
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
  * one-row callback.
  */
 
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
 	.CopyToEnd = CopyToTextLikeEnd,
 };
 
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToJsonOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
 	.CopyToStart = CopyToBinaryStart,
@@ -180,12 +191,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
 		return &CopyToRoutineCSV;
 	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
+	else if (opts->format == COPY_FORMAT_JSON)
+		return &CopyToRoutineJson;
 
 	/* default is text */
 	return &CopyToRoutineText;
 }
 
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 		ListCell   *cur;
 		bool		hdr_delim = false;
 
+		Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
 		foreach(cur, cstate->attnumlist)
 		{
 			int			attnum = lfirst_int(cur);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 }
 
 /*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
  * the output function data to the given *finfo.
  */
 static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
 	CopySendTextLikeEndOfRow(cstate);
 }
 
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
 static void
 CopyToTextLikeEnd(CopyToState cstate)
 {
 	/* Nothing to do here */
 }
 
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	Datum		rowdata;
+	StringInfo	result;
+
+	/*
+	 * If COPY TO source data come from query rather than plain table, we need
+	 * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+	 * This is necessary because the slot's TupleDesc may change during query
+	 * execution, and we depend on it when calling composite_to_json.
+	 */
+	if (!cstate->rel)
+	{
+		memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+			   TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+			   cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+		for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+			populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+		BlessTupleDesc(slot->tts_tupleDescriptor);
+	}
+	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+	result = makeStringInfo();
+	composite_to_json(rowdata, result, false);
+
+	CopySendData(cstate, result->data, result->len);
+
+	CopySendTextLikeEndOfRow(cstate);
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
@@ -397,9 +445,21 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
+
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -499,7 +559,7 @@ CopySendEndOfRow(CopyToState cstate)
 }
 
 /*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
  * line termination and do common appropriate things for the end of row.
  */
 static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f1def67ac7c..664b0483dbd 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3530,6 +3530,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3612,6 +3616,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+				{
+					$$ = makeDefElem("format", $2, @1);
+				}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index e9d370cb3da..e517470bbc7 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  Datum *vals, bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6176741d20b..15fe7b37b0e 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3347,8 +3347,10 @@ match_previous_words(int pattern_id,
 		COMPLETE_WITH(Copy_to_options);
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
-	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
+	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "FORMAT"))
 		COMPLETE_WITH("binary", "csv", "text");
+	else if (Matches("COPY|\\copy", MatchAny, "TO", MatchAny, "WITH", "(", "FORMAT"))
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 686653233b2..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
 } CopyFormat;
 
 /*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..0fc6e84352c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,82 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR:  COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR:  COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+                                              ^
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..071986d427a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..d85c2ec3c70 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -515,6 +515,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromRoutine
 CopyFromState
-- 
2.34.1

v19-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchtext/x-patch; charset=US-ASCII; name=v19-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From 2abacfaebc0fd5a37a3b797df92c3b3a85761afa Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 30 Jul 2025 19:50:41 +0800
Subject: [PATCH v19 3/3] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.  it make the
output json output behave like json array type.  refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 +++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++
 src/backend/commands/copyto.c      | 37 +++++++++++++++++++++++++++++-
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++
 src/test/regress/sql/copy.sql      |  8 +++++++
 7 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 219604ad306..c01927864bd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>json</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>FORCE_QUOTE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7fd41dba250..0a22272f3fe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -514,6 +514,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -670,6 +671,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -925,6 +933,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 13eb14debbd..6fc1d3e9fee 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,10 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+
+	/* need delimiter to start next json array element */
+	bool		json_row_delim_needed;
+
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -128,6 +132,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
 static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +177,7 @@ static const CopyToRoutine CopyToRoutineJson = {
 	.CopyToStart = CopyToTextLikeStart,
 	.CopyToOutFunc = CopyToTextLikeOutFunc,
 	.CopyToOneRow = CopyToJsonOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
+	.CopyToEnd = CopyToJsonEnd,
 };
 
 /* binary format */
@@ -238,6 +243,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 		CopySendTextLikeEndOfRow(cstate);
 	}
+
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * opening bracket.
+	 */
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, '[');
+		CopySendTextLikeEndOfRow(cstate);
+	}
 }
 
 /*
@@ -349,11 +364,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
 	result = makeStringInfo();
 	composite_to_json(rowdata, result, false);
 
+	if (cstate->json_row_delim_needed && cstate->opts.force_array)
+		CopySendChar(cstate, ',');
+	else if (cstate->opts.force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		cstate->json_row_delim_needed = true;
+	}
+
 	CopySendData(cstate, result->data, result->len);
 
 	CopySendTextLikeEndOfRow(cstate);
 }
 
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+	if (cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendTextLikeEndOfRow(cstate);
+	}
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 15fe7b37b0e..a42fb6c0740 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1222,7 +1222,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
 
 /* COPY TO options */
 #define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
 
 /*
  * These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
 	List	   *force_notnull;	/* list of column names */
 	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
 	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	bool		force_array;	/* add JSON array decorations */
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0fc6e84352c..22626a13ba5 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -112,6 +112,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 071986d427a..0f121b48f71 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v19-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patchtext/x-patch; charset=US-ASCII; name=v19-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patchDownload

From 6c3e515ce84bdba4e0c96f46d8d5543481d3e1ca Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 30 Jul 2025 16:48:56 +0800
Subject: [PATCH v19 1/3] introduce CopyFormat refactor CopyFormatOptions

Currently, COPY command format is determined by two booleans, binary and
csv_mode, within CopyFormatOptions. This approach, while functional, isn't ideal
for future expansion.

To simplify adding new formats, we've introduced an enum CopyFormat.  This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.

The CopyFormat enum was originally contributed by Joel Jacobson
joel@compiler.org, later refactored by Jian He to address various issues.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      |  6 ++--
 src/backend/commands/copyfromparse.c |  7 ++--
 src/backend/commands/copyto.c        |  8 ++---
 src/include/commands/copy.h          | 13 ++++++--
 5 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..68f69cfb9df 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -521,6 +521,8 @@ ProcessCopyOptions(ParseState *pstate,
 		opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
 
 	opts_out->file_encoding = -1;
+	/* default format */
+	opts_out->format = COPY_FORMAT_TEXT;
 
 	/* Extract options from the statement node tree */
 	foreach(option, options)
@@ -535,11 +537,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -699,31 +701,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -771,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -779,43 +781,43 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -829,8 +831,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+												opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -845,8 +847,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -870,7 +872,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -906,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -923,7 +925,7 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..ba31b227d5f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyFromRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyFromRoutineBinary;
 
 	/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 {
 	return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
-										 cstate->opts.csv_mode);
+										 cstate->opts.format == COPY_FORMAT_CSV);
 }
 
 /*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
 	bool		done = false;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+		   cstate->opts.format == COPY_FORMAT_CSV);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..e990343bab0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -176,9 +176,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyToRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
 
 	/* default is text */
@@ -215,7 +215,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				CopyAttributeOutCSV(cstate, colname, false);
 			else
 				CopyAttributeOutText(cstate, colname);
@@ -392,7 +392,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..686653233b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +68,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	int			header_line;	/* number of lines to skip or COPY_HEADER_XXX
 								 * value (see the above) */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
-- 
2.34.1

#137

jian.universality@gmail.com

2 months ago

In reply to: jian he (#136)

3 attachment(s)

Re: Emitting JSON to file using COPY TO

On Wed, Oct 1, 2025 at 2:16 PM jian he <jian.universality@gmail.com> wrote:

hi.
v19 attached, same as v18.
repost it so that CFbot can pick up the latest patchset.

hi.

new patch attached, rebase only.

Attachments:

v20-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patchtext/x-patch; charset=US-ASCII; name=v20-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patchDownload

From 97e63d2b7de1fef820305b279d9e5602c82dab53 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 10 Nov 2025 08:39:36 +0800
Subject: [PATCH v20 1/3] introduce CopyFormat refactor CopyFormatOptions

Currently, COPY command format is determined by two booleans, binary and
csv_mode, within CopyFormatOptions. This approach, while functional, isn't ideal
for future expansion.

To simplify adding new formats, we've introduced an enum CopyFormat.  This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.

The CopyFormat enum was originally contributed by Joel Jacobson
joel@compiler.org, later refactored by Jian He to address various issues.

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 src/backend/commands/copy.c          | 50 +++++++++++++++-------------
 src/backend/commands/copyfrom.c      |  6 ++--
 src/backend/commands/copyfromparse.c |  7 ++--
 src/backend/commands/copyto.c        |  8 ++---
 src/include/commands/copy.h          | 13 ++++++--
 src/tools/pgindent/typedefs.list     |  1 +
 6 files changed, 49 insertions(+), 36 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28e878c3688..d674ada98e4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -564,6 +564,8 @@ ProcessCopyOptions(ParseState *pstate,
 		opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
 
 	opts_out->file_encoding = -1;
+	/* default format */
+	opts_out->format = COPY_FORMAT_TEXT;
 
 	/* Extract options from the statement node tree */
 	foreach(option, options)
@@ -578,11 +580,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errorConflictingDefElem(defel, pstate);
 			format_specified = true;
 			if (strcmp(fmt, "text") == 0)
-				 /* default format */ ;
+				opts_out->format = COPY_FORMAT_TEXT;
 			else if (strcmp(fmt, "csv") == 0)
-				opts_out->csv_mode = true;
+				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
-				opts_out->binary = true;
+				opts_out->format = COPY_FORMAT_BINARY;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -742,31 +744,31 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->binary && opts_out->delim)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-	if (opts_out->binary && opts_out->null_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-	if (opts_out->binary && opts_out->default_print)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
-		opts_out->delim = opts_out->csv_mode ? "," : "\t";
+		opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
 
 	if (!opts_out->null_print)
-		opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+		opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
 	opts_out->null_print_len = strlen(opts_out->null_print);
 
-	if (opts_out->csv_mode)
+	if (opts_out->format == COPY_FORMAT_CSV)
 	{
 		if (!opts_out->quote)
 			opts_out->quote = "\"";
@@ -814,7 +816,7 @@ ProcessCopyOptions(ParseState *pstate,
 	 * future-proofing.  Likewise we disallow all digits though only octal
 	 * digits are actually dangerous.
 	 */
-	if (!opts_out->csv_mode &&
+	if (opts_out->format != COPY_FORMAT_CSV &&
 		strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
 			   opts_out->delim[0]) != NULL)
 		ereport(ERROR,
@@ -822,43 +824,43 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
 	/* Check quote */
-	if (!opts_out->csv_mode && opts_out->quote != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single one-byte character")));
 
-	if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+	if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("COPY delimiter and quote must be different")));
 
 	/* Check escape */
-	if (!opts_out->csv_mode && opts_out->escape != NULL)
+	if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
 				 errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-	if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+	if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single one-byte character")));
 
 	/* Check force_quote */
-	if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -872,8 +874,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY FROM")));
 
 	/* Check force_notnull */
-	if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-								opts_out->force_notnull_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+												opts_out->force_notnull_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -888,8 +890,8 @@ ProcessCopyOptions(ParseState *pstate,
 						"COPY TO")));
 
 	/* Check force_null */
-	if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-								opts_out->force_null_all))
+	if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+												opts_out->force_null_all))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -913,7 +915,7 @@ ProcessCopyOptions(ParseState *pstate,
 						"NULL")));
 
 	/* Don't allow the CSV quote char to appear in the null string. */
-	if (opts_out->csv_mode &&
+	if (opts_out->format == COPY_FORMAT_CSV &&
 		strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -949,7 +951,7 @@ ProcessCopyOptions(ParseState *pstate,
 							"DEFAULT")));
 
 		/* Don't allow the CSV quote char to appear in the default string. */
-		if (opts_out->csv_mode &&
+		if (opts_out->format == COPY_FORMAT_CSV &&
 			strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -966,7 +968,7 @@ ProcessCopyOptions(ParseState *pstate,
 					 errmsg("NULL specification and DEFAULT specification cannot be the same")));
 	}
 	/* Check on_error */
-	if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..ba31b227d5f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyFromRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyFromRoutineBinary;
 
 	/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
 				   cstate->cur_relname);
 		return;
 	}
-	if (cstate->opts.binary)
+	if (cstate->opts.format == COPY_FORMAT_BINARY)
 	{
 		/* can't usefully display the data */
 		if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 {
 	return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
-										 cstate->opts.csv_mode);
+										 cstate->opts.format == COPY_FORMAT_CSV);
 }
 
 /*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
 	bool		done = false;
 
 	/* only available for text or csv input */
-	Assert(!cstate->opts.binary);
+	Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+		   cstate->opts.format == COPY_FORMAT_CSV);
 
 	/* on input check that the header line is correct if needed */
 	if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..c97f0460b3e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -181,9 +181,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-	if (opts->csv_mode)
+	if (opts->format == COPY_FORMAT_CSV)
 		return &CopyToRoutineCSV;
-	else if (opts->binary)
+	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
 
 	/* default is text */
@@ -220,7 +220,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-			if (cstate->opts.csv_mode)
+			if (cstate->opts.format == COPY_FORMAT_CSV)
 				CopyAttributeOutCSV(cstate, colname, false);
 			else
 				CopyAttributeOutText(cstate, colname);
@@ -397,7 +397,7 @@ SendCopyBegin(CopyToState cstate)
 {
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
-	int16		format = (cstate->opts.binary ? 1 : 0);
+	int16		format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
 	int			i;
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..686653233b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,16 @@ typedef enum CopyLogVerbosityChoice
 	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+	COPY_FORMAT_TEXT = 0,
+	COPY_FORMAT_BINARY,
+	COPY_FORMAT_CSV,
+} CopyFormat;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +68,8 @@ typedef struct CopyFormatOptions
 	/* parameters from the COPY command */
 	int			file_encoding;	/* file or remote side's character encoding,
 								 * -1 if not specified */
-	bool		binary;			/* binary format? */
+	CopyFormat	format;			/* format of the COPY operation */
 	bool		freeze;			/* freeze rows on loading? */
-	bool		csv_mode;		/* Comma Separated Value format? */
 	int			header_line;	/* number of lines to skip or COPY_HEADER_XXX
 								 * value (see the above) */
 	char	   *null_print;		/* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 432509277c9..256b5000af4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -516,6 +516,7 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormat
 CopyFormatOptions
 CopyFromRoutine
 CopyFromState
-- 
2.34.1

v20-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchtext/x-patch; charset=US-ASCII; name=v20-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patchDownload

From e42e36118f73fa1a98f698031e3f1f7cbb9150cf Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 30 Jul 2025 19:50:41 +0800
Subject: [PATCH v20 3/3] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.  it make the
output json output behave like json array type.  refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 14 +++++++++++
 src/backend/commands/copy.c        | 13 +++++++++++
 src/backend/commands/copyto.c      | 37 +++++++++++++++++++++++++++++-
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/test/regress/expected/copy.out | 23 +++++++++++++++++++
 src/test/regress/sql/copy.sql      |  8 +++++++
 7 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 2d6d6802cbd..d8d9fb173b4 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
     QUOTE '<replaceable class="parameter">quote_character</replaceable>'
     ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+    FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>FORCE_ARRAY</literal></term>
+    <listitem>
+     <para>
+      Force output of square brackets as array decorations at the beginning
+      and end of output, and commas between the rows. It is allowed only in
+      <command>COPY TO</command>, and only when using
+      <literal>json</literal> format. The default is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>FORCE_QUOTE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0ec9b22d20f..6f9ae3fbfd7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -557,6 +557,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -713,6 +714,13 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+				errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -968,6 +976,11 @@ ProcessCopyOptions(ParseState *pstate,
 				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index accf34e1a60..b58c5bdf987 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -86,6 +86,10 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+
+	/* need delimiter to start next json array element */
+	bool		json_row_delim_needed;
+
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -133,6 +137,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
 static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -177,7 +182,7 @@ static const CopyToRoutine CopyToRoutineJson = {
 	.CopyToStart = CopyToTextLikeStart,
 	.CopyToOutFunc = CopyToTextLikeOutFunc,
 	.CopyToOneRow = CopyToJsonOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
+	.CopyToEnd = CopyToJsonEnd,
 };
 
 /* binary format */
@@ -243,6 +248,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
 		CopySendTextLikeEndOfRow(cstate);
 	}
+
+	/*
+	 * If JSON has been requested, and FORCE_ARRAY has been specified send the
+	 * opening bracket.
+	 */
+	if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+	{
+		CopySendChar(cstate, '[');
+		CopySendTextLikeEndOfRow(cstate);
+	}
 }
 
 /*
@@ -354,11 +369,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
 	result = makeStringInfo();
 	composite_to_json(rowdata, result, false);
 
+	if (cstate->json_row_delim_needed && cstate->opts.force_array)
+		CopySendChar(cstate, ',');
+	else if (cstate->opts.force_array)
+	{
+		/* first row needs no delimiter */
+		CopySendChar(cstate, ' ');
+		cstate->json_row_delim_needed = true;
+	}
+
 	CopySendData(cstate, result->data, result->len);
 
 	CopySendTextLikeEndOfRow(cstate);
 }
 
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+	if (cstate->opts.force_array)
+	{
+		CopySendChar(cstate, ']');
+		CopySendTextLikeEndOfRow(cstate);
+	}
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 0fd06a31201..e550aa38a25 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
 
 /* COPY TO options */
 #define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
 
 /*
  * These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
 	List	   *force_notnull;	/* list of column names */
 	bool		force_notnull_all;	/* FORCE_NOT_NULL *? */
 	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	bool		force_array;	/* add JSON array decorations */
 	List	   *force_null;		/* list of column names */
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 10333357d68..8becc70ee7a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -112,6 +112,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 ERROR:  COPY json mode cannot be used with COPY FROM
 -- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR:  COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 80799e2ead9..6a14cfc6e68 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
 copy copytest from stdin(format json);
 -- all of the above should yield error
 
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
 -- embedded escaped characters
 create temp table copyjsontest (
     id bigserial,
-- 
2.34.1

v20-0002-json-format-for-COPY-TO.patchtext/x-patch; charset=US-ASCII; name=v20-0002-json-format-for-COPY-TO.patchDownload

From 089630f0f4706b31c4c886d033e15ad20761dd6d Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 10 Nov 2025 08:40:26 +0800
Subject: [PATCH v20 2/3] json format for COPY TO

JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql

The CopyFormat enum was originally contributed by Joel Jacobson
joel@compiler.org, later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).

Author: Joe Conway <mail@joeconway.com>
Reviewed-by: "Andrey M. Borodin" <x4mmm@yandex-team.ru>,
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>,
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>,
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>,
Reviewed-by: Davin Shearer <davin@apache.org>,
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>,
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>

discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c34c@joeconway.com
---
 doc/src/sgml/ref/copy.sgml         | 13 +++--
 src/backend/commands/copy.c        | 72 +++++++++++++++++++++-------
 src/backend/commands/copyto.c      | 76 ++++++++++++++++++++++++++----
 src/backend/parser/gram.y          |  8 ++++
 src/backend/utils/adt/json.c       |  5 +-
 src/bin/psql/tab-complete.in.c     |  2 +-
 src/include/commands/copy.h        |  1 +
 src/include/utils/json.h           |  2 +
 src/test/regress/expected/copy.out | 76 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 47 ++++++++++++++++++
 10 files changed, 268 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fdc24b36bb8..2d6d6802cbd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Selects the data format to be read or written:
       <literal>text</literal>,
       <literal>csv</literal> (Comma Separated Values),
+      <literal>json</literal> (JavaScript Object Notation),
       or <literal>binary</literal>.
       The default is <literal>text</literal>.
       See <xref linkend="sql-copy-file-formats"/> below for details.
      </para>
+     <para>
+      The <literal>json</literal> option is allowed only in
+      <command>COPY TO</command>.
+     </para>
     </listitem>
    </varlistentry>
 
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       (line) of the file.  The default is a tab character in text format,
       a comma in <literal>CSV</literal> format.
       This must be a single one-byte character.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       string in <literal>CSV</literal> format. You might prefer an
       empty string even in text format for cases where you don't want to
       distinguish nulls from empty strings.
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
 
      <note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       is found in the input file, the default value of the corresponding column
       will be used.
       This option is allowed only in <command>COPY FROM</command>, and only when
-      not using <literal>binary</literal> format.
+      not using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       <command>COPY FROM</command> commands.
      </para>
      <para>
-      This option is not allowed when using <literal>binary</literal> format.
+      This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d674ada98e4..0ec9b22d20f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -585,6 +585,8 @@ ProcessCopyOptions(ParseState *pstate,
 				opts_out->format = COPY_FORMAT_CSV;
 			else if (strcmp(fmt, "binary") == 0)
 				opts_out->format = COPY_FORMAT_BINARY;
+			else if (strcmp(fmt, "json") == 0)
+				opts_out->format = COPY_FORMAT_JSON;
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -744,21 +746,42 @@ ProcessCopyOptions(ParseState *pstate,
 	 * Check for incompatible options (must do these three before inserting
 	 * defaults)
 	 */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+	if (opts_out->delim)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+	}
 
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "NULL")));
+	if (opts_out->null_print)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in BINARY mode", "NULL"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "NULL"));
+	}
 
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+	if (opts_out->default_print)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in BINARY mode", "DEFAULT"));
+		else if (opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_SYNTAX_ERROR),
+					errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+	}
 
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
@@ -824,11 +847,18 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
 	/* Check header */
-	if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
-				 errmsg("cannot specify %s in BINARY mode", "HEADER")));
+	if (opts_out->header_line != COPY_HEADER_FALSE)
+	{
+		if (opts_out->format == COPY_FORMAT_BINARY)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+					errmsg("cannot specify %s in BINARY mode", "HEADER"));
+		else if(opts_out->format == COPY_FORMAT_JSON)
+			ereport(ERROR,
+					errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					errmsg("cannot specify %s in JSON mode", "HEADER"));
+	}
 
 	/* Check quote */
 	if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -932,6 +962,12 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("COPY %s cannot be used with %s", "FREEZE",
 						"COPY TO")));
 
+	/* Check json format */
+	if (opts_out->format == COPY_FORMAT_JSON && is_from)
+		ereport(ERROR,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c97f0460b3e..accf34e1a60 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
 #include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -130,6 +132,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
 								 bool is_csv);
 static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
 static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -149,7 +152,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 /*
  * COPY TO routines for built-in formats.
  *
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
  * one-row callback.
  */
 
@@ -169,6 +172,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
 	.CopyToEnd = CopyToTextLikeEnd,
 };
 
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToJsonOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
 	.CopyToStart = CopyToBinaryStart,
@@ -185,12 +196,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
 		return &CopyToRoutineCSV;
 	else if (opts->format == COPY_FORMAT_BINARY)
 		return &CopyToRoutineBinary;
+	else if (opts->format == COPY_FORMAT_JSON)
+		return &CopyToRoutineJson;
 
 	/* default is text */
 	return &CopyToRoutineText;
 }
 
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
@@ -209,6 +222,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 		ListCell   *cur;
 		bool		hdr_delim = false;
 
+		Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
 		foreach(cur, cstate->attnumlist)
 		{
 			int			attnum = lfirst_int(cur);
@@ -231,7 +246,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 }
 
 /*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
  * the output function data to the given *finfo.
  */
 static void
@@ -304,13 +319,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
 	CopySendTextLikeEndOfRow(cstate);
 }
 
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
 static void
 CopyToTextLikeEnd(CopyToState cstate)
 {
 	/* Nothing to do here */
 }
 
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	Datum		rowdata;
+	StringInfo	result;
+
+	/*
+	 * If COPY TO source data come from query rather than plain table, we need
+	 * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+	 * This is necessary because the slot's TupleDesc may change during query
+	 * execution, and we depend on it when calling composite_to_json.
+	 */
+	if (!cstate->rel)
+	{
+		memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+			   TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+			   cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+		for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+			populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+		BlessTupleDesc(slot->tts_tupleDescriptor);
+	}
+	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+	result = makeStringInfo();
+	composite_to_json(rowdata, result, false);
+
+	CopySendData(cstate, result->data, result->len);
+
+	CopySendTextLikeEndOfRow(cstate);
+}
+
 /*
  * Implementation of the start callback for binary format. Send a header
  * for a binary copy.
@@ -402,9 +450,21 @@ SendCopyBegin(CopyToState cstate)
 
 	pq_beginmessage(&buf, PqMsg_CopyOutResponse);
 	pq_sendbyte(&buf, format);	/* overall format */
-	pq_sendint16(&buf, natts);
-	for (i = 0; i < natts; i++)
-		pq_sendint16(&buf, format); /* per-column formats */
+	if (cstate->opts.format != COPY_FORMAT_JSON)
+	{
+		pq_sendint16(&buf, natts);
+		for (i = 0; i < natts; i++)
+			pq_sendint16(&buf, format); /* per-column formats */
+	}
+	else
+	{
+		/*
+		 * JSON format is always one non-binary column
+		 */
+		pq_sendint16(&buf, 1);
+		pq_sendint16(&buf, 0);
+	}
+
 	pq_endmessage(&buf);
 	cstate->copy_dest = COPY_FRONTEND;
 }
@@ -504,7 +564,7 @@ CopySendEndOfRow(CopyToState cstate)
 }
 
 /*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
  * line termination and do common appropriate things for the end of row.
  */
 static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 57fe0186547..7cbdadc98ac 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3557,6 +3557,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
 				}
+			| JSON
+				{
+					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+				}
 			| HEADER_P
 				{
 					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3639,6 +3643,10 @@ copy_generic_opt_elem:
 				{
 					$$ = makeDefElem($1, $2, @1);
 				}
+			| FORMAT_LA copy_generic_opt_arg
+				{
+					$$ = makeDefElem("format", $2, @1);
+				}
 		;
 
 copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 06dd62f0008..647adafd227 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
 	JsonUniqueBuilderState unique_check;
 } JsonAggState;
 
-static void composite_to_json(Datum composite, StringInfo result,
-							  bool use_line_feeds);
 static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
 							  const Datum *vals, const bool *nulls, int *valcount,
 							  JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
 
 /*
  * Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
  */
-static void
+void
 composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 {
 	HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 316a2dafbf1..0fd06a31201 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3376,7 +3376,7 @@ match_previous_words(int pattern_id,
 	/* Complete COPY <sth> FROM|TO [PROGRAM] <sth> WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAnyExcept("PROGRAM"), "WITH", "(", "FORMAT") ||
 			 Matches("COPY|\\copy", MatchAny, "FROM|TO", "PROGRAM", MatchAny, "WITH", "(", "FORMAT"))
-		COMPLETE_WITH("binary", "csv", "text");
+		COMPLETE_WITH("binary", "csv", "text", "json");
 
 	/* Complete COPY <sth> FROM [PROGRAM] filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "ON_ERROR") ||
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 686653233b2..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyFormat
 	COPY_FORMAT_TEXT = 0,
 	COPY_FORMAT_BINARY,
 	COPY_FORMAT_CSV,
+	COPY_FORMAT_JSON,
 } CopyFormat;
 
 /*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 
 /* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+							  bool use_line_feeds);
 extern void escape_json(StringInfo buf, const char *str);
 extern void escape_json_with_len(StringInfo buf, const char *str, int len);
 extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 24e0f472f14..10333357d68 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,82 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR:  cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR:  cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR:  cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR:  cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR:  COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR:  COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR:  COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR:  COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR:  COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+                                              ^
+copy copytest from stdin(format json);
+ERROR:  COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 676a8b342b5..80799e2ead9 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
 
 copy copytest3 to stdout csv header;
 
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+    id bigserial,
+    f1 text,
+    f2 timestamptz);
+
+insert into copyjsontest
+  select g.i,
+         CASE WHEN g.i % 2 = 0 THEN
+           'line with '' in it: ' || g.i::text
+         ELSE
+           'line with " in it: ' || g.i::text
+         END,
+         'Mon Feb 10 17:32:01 1997 PST'
+  from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
 create temp table copytest4 (
 	c1 int,
 	"colname with tab: 	" text);
-- 
2.34.1

#138