[PATCH] Reject ENCODING option for COPY TO FORMAT JSON
Hi hackers,
COPY TO FORMAT JSON silently accepts the ENCODING option but doesn't
perform encoding conversion(?) CopyToJsonOneRow() sends the output of
composite_to_json() via CopySendData() without calling
pg_server_to_any(), unlike the text and CSV paths.
COPY t TO '/tmp/out.json' WITH (FORMAT json, ENCODING 'LATIN1');
On a UTF-8 server this produces UTF-8 output, not LATIN1.
RFC 8259 says JSON text must be UTF-8, so arguably JSON output
should never be converted. But even under that interpretation,
silently accepting the option and ignoring it looks wrong, the user
explicitly asked for LATIN1 and got something else. The same issue
also affects COPY TO STDOUT when client_encoding differs from the
server encoding, since the default file_encoding is the client
encoding and CopyToJsonOneRow never checks need_transcoding.
The attached patch rejects the explicit ENCODING option for JSON
mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are
already rejected. The implicit client_encoding case is a separate
design question (should COPY TO JSON always emit UTF-8 regardless
of client_encoding?) that maybe we should address separately and not as
part of v19.
Introduced by 7dadd38cda9 (json format for COPY TO). I've attached a patch
for rejecting the ENCODING option. Thoughts?
Attachments:
0001-Reject-ENCODING-option-for-COPY-TO-FORMAT-JSON.patchapplication/octet-stream; name=0001-Reject-ENCODING-option-for-COPY-TO-FORMAT-JSON.patchDownload+10-1
On 20 Apr 2026, at 08:06, Ayush Tiwari <ayushtiwari.slg01@gmail.com> wrote:
The attached patch rejects the explicit ENCODING option for JSON
mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are
already rejected.
Given that we reject other incompatible parameters it makes sense to reject
this one as well, however I think we can expand the comment a little and
explain why.
--
Daniel Gustafsson
Hi,
On Mon, 20 Apr 2026 at 13:56, Daniel Gustafsson <daniel@yesql.se> wrote:
On 20 Apr 2026, at 08:06, Ayush Tiwari <ayushtiwari.slg01@gmail.com>
wrote:
The attached patch rejects the explicit ENCODING option for JSON
mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are
already rejected.Given that we reject other incompatible parameters it makes sense to reject
this one as well, however I think we can expand the comment a little and
explain why.
Thanks Daniel. Agreed, v2 attached with an expanded comment
explaining why the option is rejected, I've tried to make it small, because
rest
rejected ones did not have relevant comments.
/*
* Reject ENCODING for JSON format. JSON output is produced as
* a whole by composite_to_json(), so the per-attribute encoding
* conversion done for text and CSV output is not applied.
*/
Regards,
Ayush
Attachments:
v2-0001-Reject-ENCODING-option-for-COPY-TO-FORMAT-JSON.patchapplication/octet-stream; name=v2-0001-Reject-ENCODING-option-for-COPY-TO-FORMAT-JSON.patchDownload+14-1
Ayush Tiwari <ayushtiwari.slg01@gmail.com> writes:
COPY TO FORMAT JSON silently accepts the ENCODING option but doesn't
perform encoding conversion(?) CopyToJsonOneRow() sends the output of
composite_to_json() via CopySendData() without calling
pg_server_to_any(), unlike the text and CSV paths.
COPY t TO '/tmp/out.json' WITH (FORMAT json, ENCODING 'LATIN1');
On a UTF-8 server this produces UTF-8 output, not LATIN1.
Seems to me the correct thing here is to make it work like the other
cases, ie perform pg_server_to_any(). I have exactly no sympathy for
the argument about the RFC saying it must be UTF-8, not least because
that's not in fact what is implemented (what if the server encoding
isn't UTF-8?).
Rejecting this option altogether doesn't improve anything, not
functionally, not specs-compliance-wise, nor according to the
principle of least surprise.
The attached patch rejects the explicit ENCODING option for JSON
mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are
already rejected. The implicit client_encoding case is a separate
design question (should COPY TO JSON always emit UTF-8 regardless
of client_encoding?) that maybe we should address separately and not as
part of v19.
No, you don't get to punt this till later. Once we ship v19 there's
going to be a strong expectation of backwards compatibility.
The idea of sending UTF-8 to a client that's set client_encoding to
something else would be risible, if it weren't a security hazard.
regards, tom lane
Hi,
On Mon, 20 Apr 2026 at 19:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ayush Tiwari <ayushtiwari.slg01@gmail.com> writes:
COPY TO FORMAT JSON silently accepts the ENCODING option but doesn't
perform encoding conversion(?) CopyToJsonOneRow() sends the output of
composite_to_json() via CopySendData() without calling
pg_server_to_any(), unlike the text and CSV paths.COPY t TO '/tmp/out.json' WITH (FORMAT json, ENCODING 'LATIN1');
On a UTF-8 server this produces UTF-8 output, not LATIN1.
Seems to me the correct thing here is to make it work like the other
cases, ie perform pg_server_to_any(). I have exactly no sympathy for
the argument about the RFC saying it must be UTF-8, not least because
that's not in fact what is implemented (what if the server encoding
isn't UTF-8?).
Agreed. I initially thought rejecting the option was the safer route
given the RFC, but as you pointed out, we aren't enforcing
UTF-8 strictly on the server side anyway.
Rejecting this option altogether doesn't improve anything, not
functionally, not specs-compliance-wise, nor according to the
principle of least surprise.
Makes sense. Implementing the conversion properly
keeps JSON format consistent with how the text and CSV formats behave.
The attached patch rejects the explicit ENCODING option for JSON
mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are
already rejected. The implicit client_encoding case is a separate
design question (should COPY TO JSON always emit UTF-8 regardless
of client_encoding?) that maybe we should address separately and not as
part of v19.No, you don't get to punt this till later. Once we ship v19 there's
going to be a strong expectation of backwards compatibility.The idea of sending UTF-8 to a client that's set client_encoding to
something else would be risible, if it weren't a security hazard.
I agree sending unconverted bytes to a mismatched
client encoding is clearly a security hazard that needs addressing. Did
not consider the backward compatibility part, my bad.
Was trying out adding pg_server_to_any() to the json_buf after
composite_to_json() returns,
correctly covering both explicit ENCODING option specifications and
implicit client_encoding mismatches.
Let me send a patch with code and associated test cases.
Regards,
Ayush
Hi,
On Mon, 20 Apr 2026 at 19:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seems to me the correct thing here is to make it work like the other
cases, ie perform pg_server_to_any(). I have exactly no sympathy for
the argument about the RFC saying it must be UTF-8, not least because
that's not in fact what is implemented (what if the server encoding
isn't UTF-8?).Agreed. I initially thought rejecting the option was the safer route
given the RFC, but as you pointed out, we aren't enforcing
UTF-8 strictly on the server side anyway.Rejecting this option altogether doesn't improve anything, not
functionally, not specs-compliance-wise, nor according to the
principle of least surprise.Makes sense. Implementing the conversion properly
keeps JSON format consistent with how the text and CSV formats behave.No, you don't get to punt this till later. Once we ship v19 there's
going to be a strong expectation of backwards compatibility.The idea of sending UTF-8 to a client that's set client_encoding to
something else would be risible, if it weren't a security hazard.I agree sending unconverted bytes to a mismatched
client encoding is clearly a security hazard that needs addressing. Did
not consider the backward compatibility part, my bad.Was trying out adding pg_server_to_any() to the json_buf after
composite_to_json() returns,
correctly covering both explicit ENCODING option specifications and
implicit client_encoding mismatches.Let me send a patch with code and associated test cases.
Attached patch with round trip test case. Please review and let me
know if it's in the right direction.
Regards,
Ayush