Reject HEADER with binary and json COPY formats by option presence

Started by Chao Li24 days ago4 messageshackers
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,

While testing “file_fdw: Support multi-line HEADER option”, I noticed a small issue.

The doc says that the HEADER option cannot be used with the “binary" or “json" format:
```
<varlistentry id="sql-copy-params-header">
<term><literal>HEADER</literal></term>
<listitem>
<para>
On output, if this option is set to <literal>true</literal>
(or an equivalent Boolean value), the first line of the output will
contain the column names from the table.
Integer values <literal>0</literal> and <literal>1</literal> are
accepted as Boolean values, but other integers are not allowed for
<command>COPY TO</command> commands.
</para>
<para>
On input, if this option is set to <literal>true</literal>
(or an equivalent Boolean value), the first line of the input is
discarded. If set to a non-negative integer, that number of
lines are discarded. If set to <literal>MATCH</literal>, the first line
is discarded, and it must contain column names that exactly match the
table's columns, in both number and order; otherwise, an error is raised.
The <literal>MATCH</literal> value is only valid for
<command>COPY FROM</command> commands.
</para>
<para>
This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
```

However, when I specified "header ‘0", the command did not fail. That means the current behavior depends on the value of the “header" option, not on presence:
```
evantest=# create foreign table ft (i int) server fs options (format 'binary', filename '/tmp/ft.bin', header '1');
ERROR: cannot specify HEADER in BINARY mode
evantest=# create foreign table ft (i int) server fs options (format 'binary', filename '/tmp/ft.bin', header '0');
CREATE FOREIGN TABLE
```

As we can see, "header 1" fails, but header 0" is silently accepted. I don't think this behavior matches what the documentation describes.

For comparison, VACUUM has a similar option. “BUFFER_USAGE_LIMIT" is not allowed with "VACUUM FULL", and a value of 0 means disabling the buffer access strategy:
```
<varlistentry>
<term><literal>BUFFER_USAGE_LIMIT</literal></term>
<listitem>
<para>
Specifies the
<glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>
ring buffer size for <command>VACUUM</command>. This size is used to
calculate the number of shared buffers which will be reused as part of
this strategy. <literal>0</literal> disables use of a
<literal>Buffer Access Strategy</literal>. If <option>ANALYZE</option>
is also specified, the <option>BUFFER_USAGE_LIMIT</option> value is used
for both the vacuum and analyze stages. This option can't be used with
the <option>FULL</option> option except if <option>ANALYZE</option> is
also specified. When this option is not specified,
<command>VACUUM</command> uses the value from
<xref linkend="guc-vacuum-buffer-usage-limit"/>. Higher settings can
allow <command>VACUUM</command> to run more quickly, but having too
large a setting may cause too many other useful pages to be evicted from
shared buffers. The minimum value is <literal>128 kB</literal> and the
maximum value is <literal>16 GB</literal>.
</para>
</listitem>
</varlistentry>
```

Using BUFFER_USAGE_LIMIT 0 with FULL is still rejected:
```
evantest=# vacuum (full, BUFFER_USAGE_LIMIT 0) t;
ERROR: BUFFER_USAGE_LIMIT cannot be specified for VACUUM FULL
```

So VACUUM rejects BUFFER_USAGE_LIMIT based on the presence of the option, not its value. I think we should keep the behavior consistent here, and VACUUM's behavior better matches the documentation. Otherwise, I am afraid this could encourage more inconsistencies in the future.

The fix is straightforward. Since we already have the “header_specified" variable to indicate whether the option is present, we can check “header_specified" instead.

I reported a similar issue for the COPY command earlier in thread [1]/messages/by-id/C1D2509E-E5D1-46B0-932C-B57AA7B963A1@gmail.com. If this patch is accepted, then that one may be worth considering as well.

[1]: /messages/by-id/C1D2509E-E5D1-46B0-932C-B57AA7B963A1@gmail.com

--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v1-0001-Reject-HEADER-with-binary-and-json-COPY-formats-b.patchapplication/octet-stream; name=v1-0001-Reject-HEADER-with-binary-and-json-COPY-formats-b.patch; x-unix-mode=0644Download+4-2
#2Tingchuan Sun
suntingchuan1996@163.com
In reply to: Chao Li (#1)
Re:Reject HEADER with binary and json COPY formats by option presence

At 2026-06-03 11:11:29, "Chao Li" <li.evan.chao@gmail.com> wrote:

Hi,

While testing “file_fdw: Support multi-line HEADER option”, I noticed a small issue.

The doc says that the HEADER option cannot be used with the “binary" or “json" format:
```
<varlistentry id="sql-copy-params-header">
<term><literal>HEADER</literal></term>
<listitem>
<para>
On output, if this option is set to <literal>true</literal>
(or an equivalent Boolean value), the first line of the output will
contain the column names from the table.
Integer values <literal>0</literal> and <literal>1</literal> are
accepted as Boolean values, but other integers are not allowed for
<command>COPY TO</command> commands.
</para>
<para>
On input, if this option is set to <literal>true</literal>
(or an equivalent Boolean value), the first line of the input is
discarded. If set to a non-negative integer, that number of
lines are discarded. If set to <literal>MATCH</literal>, the first line
is discarded, and it must contain column names that exactly match the
table's columns, in both number and order; otherwise, an error is raised.
The <literal>MATCH</literal> value is only valid for
<command>COPY FROM</command> commands.
</para>
<para>
This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
```

However, when I specified "header ‘0", the command did not fail. That means the current behavior depends on the value of the “header" option, not on presence:
```
evantest=# create foreign table ft (i int) server fs options (format 'binary', filename '/tmp/ft.bin', header '1');
ERROR: cannot specify HEADER in BINARY mode
evantest=# create foreign table ft (i int) server fs options (format 'binary', filename '/tmp/ft.bin', header '0');
CREATE FOREIGN TABLE
```

As we can see, "header 1" fails, but header 0" is silently accepted. I don't think this behavior matches what the documentation describes.

For comparison, VACUUM has a similar option. “BUFFER_USAGE_LIMIT" is not allowed with "VACUUM FULL", and a value of 0 means disabling the buffer access strategy:
```
<varlistentry>
<term><literal>BUFFER_USAGE_LIMIT</literal></term>
<listitem>
<para>
Specifies the
<glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>
ring buffer size for <command>VACUUM</command>. This size is used to
calculate the number of shared buffers which will be reused as part of
this strategy. <literal>0</literal> disables use of a
<literal>Buffer Access Strategy</literal>. If <option>ANALYZE</option>
is also specified, the <option>BUFFER_USAGE_LIMIT</option> value is used
for both the vacuum and analyze stages. This option can't be used with
the <option>FULL</option> option except if <option>ANALYZE</option> is
also specified. When this option is not specified,
<command>VACUUM</command> uses the value from
<xref linkend="guc-vacuum-buffer-usage-limit"/>. Higher settings can
allow <command>VACUUM</command> to run more quickly, but having too
large a setting may cause too many other useful pages to be evicted from
shared buffers. The minimum value is <literal>128 kB</literal> and the
maximum value is <literal>16 GB</literal>.
</para>
</listitem>
</varlistentry>
```

Using BUFFER_USAGE_LIMIT 0 with FULL is still rejected:
```
evantest=# vacuum (full, BUFFER_USAGE_LIMIT 0) t;
ERROR: BUFFER_USAGE_LIMIT cannot be specified for VACUUM FULL
```

So VACUUM rejects BUFFER_USAGE_LIMIT based on the presence of the option, not its value. I think we should keep the behavior consistent here, and VACUUM's behavior better matches the documentation. Otherwise, I am afraid this could encourage more inconsistencies in the future.

The fix is straightforward. Since we already have the “header_specified" variable to indicate whether the option is present, we can check “header_specified" instead.

I reported a similar issue for the COPY command earlier in thread [1]. If this patch is accepted, then that one may be worth considering as well.

[1] /messages/by-id/C1D2509E-E5D1-46B0-932C-B57AA7B963A1@gmail.com

--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

+1 for using presence over value.

The patch looks good to me.
Regards,
Tingchuan Sun

#3Shinya Kato
shinya11.kato@gmail.com
In reply to: Chao Li (#1)
Re: Reject HEADER with binary and json COPY formats by option presence

On Sun, May 31, 2026 at 10:57 AM Chao Li <li.evan.chao@gmail.com> wrote:

While testing “file_fdw: Support multi-line HEADER option”, I noticed a small issue.

Thanks for reporting this!

The fix is straightforward. Since we already have the “header_specified" variable to indicate whether the option is present, we can check “header_specified" instead.

Overall, the patch LGTM. Could you add COPY test cases like the following:

copy t to '/tmp/test.json' (format json, header 0);
copy t to '/tmp/test.bin' (format binary, header 0);

--
Best regards,
Shinya Kato
NTT OSS Center

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Shinya Kato (#3)
Re: Reject HEADER with binary and json COPY formats by option presence

On Wed, Jun 3, 2026 at 2:24 PM Shinya Kato <shinya11.kato@gmail.com> wrote:

On Sun, May 31, 2026 at 10:57 AM Chao Li <li.evan.chao@gmail.com> wrote:

While testing “file_fdw: Support multi-line HEADER option”, I noticed a small issue.

Thanks for reporting this!

I'm not yet convinced that this needs a code change. We already seem to have
both styles today: some options are rejected based on their presence,
while others are effectively judged based on whether the specified value
actually enables the feature. I don't think we have a clear rule that
everything should be presence-based.

Regards,

--
Fujii Masao