[BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

Started by Joel Jacobsonover 1 year ago12 messageshackers
Jump to latest
#1Joel Jacobson
joel@compiler.org

Hi hackers,

Here is a patch that fixes a minor problem in copy.c's ProcessCopyOptions,
as well as fixing thinko in its tests, as well as adding new tests for
the bugfix.

[PATCH 1/2] Fix thinko in tests for COPY options force_not_null
and force_null.

Use COPY FROM for the negative tests that check that FORMAT text
cannot be used for these options, since if testing COPY TO,
which is invalid for these two options, we're testing two
invalid options at the same time, which doesn't seem intentional,
since the other tests seems to be testing invalid options one by one.

In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.

[PATCH 2/2] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.

Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.

/Joel

Attachments:

0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnull-.?= =?UTF-8?Q?patch?="Download+18-19
0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-all?= =?UTF-8?Q?-.patch?="Download+25-5
#2Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#1)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Sat, Oct 12, 2024, at 01:48, Joel Jacobson wrote:

Hi hackers,

Here is a patch that fixes a minor problem in copy.c's ProcessCopyOptions,
as well as fixing thinko in its tests, as well as adding new tests for
the bugfix.

Rebase only.

/Joel

Attachments:

v2-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patchapplication/octet-stream; name="=?UTF-8?Q?v2-0001-Fix-thinko-in-tests-for-COPY-options-force=5Fnot=5Fnul?= =?UTF-8?Q?l-.patch?="Download+18-19
v2-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patchapplication/octet-stream; name="=?UTF-8?Q?v2-0002-Fix-validation-of-FORCE=5FNOT=5FNULL-FORCE=5FNULL-for-?= =?UTF-8?Q?all-.patch?="Download+25-5
#3Zhang Mingli
zmlpostgres@gmail.com
In reply to: Joel Jacobson (#1)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

Hi,

Zhang Mingli
www.hashdata.xyz
On Oct 12, 2024 at 07:48 +0800, Joel Jacobson <joel@compiler.org>, wrote:

Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate.

Right.

we introduce FORCE_NOT_NULL/FORCE_NULL for all columns at commit f6d4c9c, but forget to check csv mode as force_notnull list does.
It could be possible user selects some columns,  then opts_out->force_notnull != NIL in this case.
Or user selects all columns, then  opts_out->force_notnull_all is true.
Both should be checked with csv mode.

Patch[2/2] make sense to me, thanks!

#4Michael Paquier
michael@paquier.xyz
In reply to: Joel Jacobson (#1)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Sat, Oct 12, 2024 at 01:48:15AM +0200, Joel Jacobson wrote:

In passing, consistently use "stdin" for COPY FROM and "stdout" for COPY TO,
even though it has no effect on the tests per se, it seems
better to be consistent, to avoid confusion.

(Looking at v2.)

-COPY x to stdin (format BINARY, delimiter ',');
+COPY x to stdout (format BINARY, delimiter ',')

Right. It does not matter because the option combinations are checked
for TO and FROM in the same ProcessCopyOptions().

[PATCH 2/2] Fix validation of FORCE_NOT_NULL/FORCE_NULL for
all-columns case.

Add missing checks for FORCE_NOT_NULL and FORCE_NULL when applied to
all columns via "*". These options now correctly require CSV mode and
are disallowed in COPY TO as appropriate. Adjusted regression
tests to verify correct behavior for the all-columns case.

You are right. f6d4c9cf162b got that wrong. Will fix and backpatch
with the extra tests.
--
Michael

#5Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#4)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Wed, Oct 16, 2024 at 02:50:53PM +0900, Michael Paquier wrote:

You are right. f6d4c9cf162b got that wrong. Will fix and backpatch
with the extra tests.

And done down to 17 for 0002, down to 16 for 0001, with tweaks in 0001
to limit the use of COPY TO in the queries where we want to force
error patterns linked to the TO clause.

The two tests for the all-column cases with force_quote were not
strictly required for the bug, still are useful to have for coverage
purposes.
--
Michael

#6Joel Jacobson
joel@compiler.org
In reply to: Michael Paquier (#5)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Thu, Oct 17, 2024, at 01:50, Michael Paquier wrote:

On Wed, Oct 16, 2024 at 02:50:53PM +0900, Michael Paquier wrote:

You are right. f6d4c9cf162b got that wrong. Will fix and backpatch
with the extra tests.

And done down to 17 for 0002, down to 16 for 0001, with tweaks in 0001
to limit the use of COPY TO in the queries where we want to force
error patterns linked to the TO clause.

The two tests for the all-column cases with force_quote were not
strictly required for the bug, still are useful to have for coverage
purposes.

Thanks for fixing.
I noticed a small mistake in the changes made in commit 03bf0d9:

In my v2-0001 patch, the correction was:

    -COPY x to stdin (format TEXT, force_quote(a));
    +COPY x to stdout (format TEXT, force_quote(a));

However, in commit 03bf0d9, the corresponding change was:

    -COPY x to stdin (format TEXT, force_quote(a));
    +COPY x from stdin (format TEXT, force_quote(a));

I believe the correction should be to use COPY TO stdout instead of
COPY FROM stdin, since FORCE_QUOTE is only applicable with COPY TO.
This way, the test correctly verifies the disallowed combination of FORCE_QUOTE
with a non-CSV format in COPY TO.

I see how this is easy to miss, since the tests for the three FORCE_* options
look very similar, but they are different in that it's only FORCE_QUOTE that
is allowed for COPY TO, whereas the two other FORCE_NOT_NULL
and FORCE_NULL are only allowed for COPY FROM.

/Joel

Attachments:

0001-Correct-negative-tests-for-the-COPY-option-FORCE_QUO.patchapplication/octet-stream; name="=?UTF-8?Q?0001-Correct-negative-tests-for-the-COPY-option-FORCE=5FQUO.pa?= =?UTF-8?Q?tch?="Download+4-5
#7Michael Paquier
michael@paquier.xyz
In reply to: Joel Jacobson (#6)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Thu, Oct 17, 2024 at 10:21:54AM +0200, Joel Jacobson wrote:

I believe the correction should be to use COPY TO stdout instead of
COPY FROM stdin, since FORCE_QUOTE is only applicable with COPY TO.
This way, the test correctly verifies the disallowed combination of FORCE_QUOTE
with a non-CSV format in COPY TO.

I see how this is easy to miss, since the tests for the three FORCE_* options
look very similar, but they are different in that it's only FORCE_QUOTE that
is allowed for COPY TO, whereas the two other FORCE_NOT_NULL
and FORCE_NULL are only allowed for COPY FROM.

This one is intentional. I was looking at the full picture of these
queries yesterday, and decided to maximize the number of COPY FROM for
all the queries that do not check option interactions that rely on TO
or FROM to exist. Hence, the tests are now shaped so as COPY TO is
only used where we expect it to be set, while the sequence is lossy
only with COPY FROM.
--
Michael

#8Joel Jacobson
joel@compiler.org
In reply to: Michael Paquier (#7)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Fri, Oct 18, 2024, at 00:52, Michael Paquier wrote:

On Thu, Oct 17, 2024 at 10:21:54AM +0200, Joel Jacobson wrote:

I believe the correction should be to use COPY TO stdout instead of
COPY FROM stdin, since FORCE_QUOTE is only applicable with COPY TO.
This way, the test correctly verifies the disallowed combination of FORCE_QUOTE
with a non-CSV format in COPY TO.

I see how this is easy to miss, since the tests for the three FORCE_* options
look very similar, but they are different in that it's only FORCE_QUOTE that
is allowed for COPY TO, whereas the two other FORCE_NOT_NULL
and FORCE_NULL are only allowed for COPY FROM.

This one is intentional. I was looking at the full picture of these
queries yesterday, and decided to maximize the number of COPY FROM for
all the queries that do not check option interactions that rely on TO
or FROM to exist. Hence, the tests are now shaped so as COPY TO is
only used where we expect it to be set, while the sequence is lossy
only with COPY FROM.

Thank you for your response and for refining the tests.

I believe we should use COPY TO specifically when testing that the FORCE_QUOTE
option requires CSV mode. Using COPY TO isolates this validation without relying
on implementation details like the current order of checks in ProcessCopyOptions,
which seems unnecessary.

While it's beneficial to maximize the use of COPY FROM in tests where the
direction doesn't matter, in this case, COPY TO or COPY FROM does matter because
FORCE_QUOTE is only valid with COPY TO. This differs from options like QUOTE,
ESCAPE, ENCODING, and NULL, which are valid for both directions.

I think the effort to standardize overshot in this instance, as FORCE_QUOTE
is the _only_ option that is only valid for COPY TO.

Therefore, in the incorrect options tests, we should use COPY TO when testing
FORCE_QUOTE, except for the test that checks using FORCE_QUOTE with COPY FROM
is an error.

As the tests are currently written, the expected error messages are produced
only by coincidence, relying on the current order of checks:

/* Check force_quote */
if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
/* ERROR: COPY FORCE_QUOTE requires CSV mode */
if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
/* ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM */

First, it checks for CSV mode,
then it checks if FORCE_QUOTE is used with COPY FROM.

Designing a test that combines two incorrect options, where only one error is
reported, seems unnecessary. If both errors were reported, we might eliminate
the need for separate tests, but that's not the case here.

I believe test specificity and clarity are important. Using COPY FROM in this
context is confusing and makes the tests harder to maintain, as testing two
things at once, deviates from the pattern established by other tests in
this section.

To improve test specificity and maintainability,
I suggest changing the tests as follows:

     -- incorrect options
     COPY x from stdin (format BINARY, delimiter ',');
     COPY x from stdin (format BINARY, null 'x');
     COPY x from stdin (format BINARY, on_error ignore);
     COPY x from stdin (on_error unsupported);
     -COPY x from stdin (format TEXT, force_quote(a));
     -COPY x from stdin (format TEXT, force_quote *);
     +COPY x to stdout (format TEXT, force_quote(a));
     +COPY x to stdout (format TEXT, force_quote *);
     COPY x from stdin (format CSV, force_quote(a));
     COPY x from stdin (format CSV, force_quote *);
     COPY x from stdin (format TEXT, force_not_null(a));
     COPY x from stdin (format TEXT, force_not_null *);
     COPY x to stdout (format CSV, force_not_null(a));
     COPY x to stdout (format CSV, force_not_null *);
     COPY x from stdin (format TEXT, force_null(a));
     COPY x from stdin (format TEXT, force_null *);
     COPY x to stdout (format CSV, force_null(a));
     COPY x to stdout (format CSV, force_null *);
     COPY x to stdout (format BINARY, on_error unsupported);
     COPY x from stdin (log_verbosity unsupported);
     COPY x from stdin with (reject_limit 1);
     COPY x from stdin with (on_error ignore, reject_limit 0);

Please let me know your thoughts on this.

/Joel

#9Michael Paquier
michael@paquier.xyz
In reply to: Joel Jacobson (#8)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Fri, Oct 18, 2024 at 08:27:44AM +0200, Joel Jacobson wrote:

I believe test specificity and clarity are important. Using COPY FROM in this
context is confusing and makes the tests harder to maintain, as testing two
things at once, deviates from the pattern established by other tests in
this section.

To improve test specificity and maintainability,

If this area of the code is refactored so as a different error is
triggered for these two specific queries, we'd still be alerted that
something is wrong the same way for HEAD or what you are suggesting.
I can see your argument, but it does not really matter because the
errors triggered by these option combinations don't link specifically
to COPY TO or COPY FROM. At the end, I'm OK with leaving things the
way they are on HEAD, and let that be.
--
Michael

#10Joel Jacobson
joel@compiler.org
In reply to: Michael Paquier (#9)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Sat, Oct 19, 2024, at 03:32, Michael Paquier wrote:

If this area of the code is refactored so as a different error is
triggered for these two specific queries, we'd still be alerted that
something is wrong the same way for HEAD or what you are suggesting.
I can see your argument, but it does not really matter because the
errors triggered by these option combinations don't link specifically
to COPY TO or COPY FROM. At the end, I'm OK with leaving things the
way they are on HEAD, and let that be.

OK, I understand your point of view, and agree there is no problem from a
safety perspective, as the tests would still alert us, just with a suboptimal
error message.

I can see how such a small change might not be worth doing in a single commit.

However, since my last email, I've found some other problems in this area,
and think we should do a more ambitious improvement, by rearranging the
incorrect options tests into three categories:

1. incorrect COPY {FROM|TO} options
2. incorrect COPY FROM options
3. incorrect COPY TO options

Also, I've found two new problems:

1. Incorrect options tests are missing for the QUOTE and ESCPAE options.
This was discovered by Jian He in a different thread.

2. One of the ON_ERROR incorrect options tests also depend on the order
of checks in ProcessCopyOptions.

To explain my current focus on the COPY tests, it's because we're currently
working on the new raw format for COPY, and it would be nice to cleanup these
tests as a preparatory step first.

New patch attached.

/Joel

Attachments:

0001-Improve-incorrect-COPY-options-tests.patchapplication/octet-stream; name="=?UTF-8?Q?0001-Improve-incorrect-COPY-options-tests.patch?="Download+48-32
#11Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#10)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Sat, Oct 19, 2024, at 09:52, Joel Jacobson wrote:

However, since my last email, I've found some other problems in this area,
and think we should do a more ambitious improvement, by rearranging the
incorrect options tests into three categories:

1. incorrect COPY {FROM|TO} options
2. incorrect COPY FROM options
3. incorrect COPY TO options

Also, I've found two new problems:

1. Incorrect options tests are missing for the QUOTE and ESCPAE options.
This was discovered by Jian He in a different thread.

2. One of the ON_ERROR incorrect options tests also depend on the order
of checks in ProcessCopyOptions.

To explain my current focus on the COPY tests, it's because we're currently
working on the new raw format for COPY, and it would be nice to cleanup these
tests as a preparatory step first.

I realize these suggested changes are out of scope for $subject, that was about a bug fix.
Therefore closing this patch and marking it as Committed.

/Joel

#12Michael Paquier
michael@paquier.xyz
In reply to: Joel Jacobson (#11)
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

On Thu, Oct 24, 2024 at 09:41:03AM +0300, Joel Jacobson wrote:

Therefore closing this patch and marking it as Committed.

Thanks. I have managed to miss the CF entry.
--
Michael