Small fix on COPY ON_ERROR document

Started by Yugo NAGATAalmost 2 years ago20 messages

nagata@sraoss.co.jp

almost 2 years ago

1 attachment(s)

Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I attached a very small patch to fix this just for
making the description more accurate.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

small_fix_on_copy_doc.patchtext/x-diff; name=small_fix_on_copy_doc.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..14c8ceab06 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -577,8 +577,8 @@ COPY <replaceable class="parameter">count</replaceable>
 
    <para>
     <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
+    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
+    This should not lead to problems in the event of a <command>COPY
     TO</command>, but the target table will already have received
     earlier rows in a <command>COPY FROM</command>. These rows will not
     be visible or accessible, but they still occupy disk space. This might

Masahiko Sawada

sawada.mshk@gmail.com

almost 2 years ago

In reply to: Yugo NAGATA (#1)

Re: Small fix on COPY ON_ERROR document

On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I attached a very small patch to fix this just for
making the description more accurate.

Thank you for the patch!

+1 to fix it.

-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
+    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
+    This should not lead to problems in the event of a <command>COPY

How about the followings for consistency with the description of the
ON_ERROR option?

COPY stops operation at the first error if the stop value is specified
to the ON_ERROR option. This should not lead to ...

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: Masahiko Sawada (#2)

1 attachment(s)

Re: Small fix on COPY ON_ERROR document

On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I attached a very small patch to fix this just for
making the description more accurate.

Thank you for the patch!

+1 to fix it.
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
+    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
+    This should not lead to problems in the event of a <command>COPY
How about the followings for consistency with the description of the
ON_ERROR option?

COPY stops operation at the first error if the stop value is specified
to the ON_ERROR option. This should not lead to ...

Thank you for you review. However, after posting the previous patch,
I noticed that I overlooked ON_ERROR works only for errors due to data
type incompatibility (that is mentioned as "malformed data" in the
ON_ERROR description, though).

So, how about rewriting this like:

COPY stops operation at the first error unless the error is due to data
type incompatibility and a value other than stop is specified to the
ON_ERROR option.

I attached the updated patch.

Regards,
Yugo Nagata

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v2_small_fix_on_copy_doc.patchtext/x-diff; name=v2_small_fix_on_copy_doc.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..3676bf0e87 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -576,9 +576,10 @@ COPY <replaceable class="parameter">count</replaceable>
    </para>
 
    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
+    <command>COPY</command> stops operation at the first error unless the error
+    is due to data type incompatibility and a value other than
+    <literal>stop</literal> is specified to the <literal>ON_ERROR</literal> option.
+    This should not lead to problems in the event of a <command>COPY
     TO</command>, but the target table will already have received
     earlier rows in a <command>COPY FROM</command>. These rows will not
     be visible or accessible, but they still occupy disk space. This might

Masahiko Sawada

sawada.mshk@gmail.com

almost 2 years ago

In reply to: Yugo NAGATA (#3)

Re: Small fix on COPY ON_ERROR document

On Fri, Jan 26, 2024 at 2:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I attached a very small patch to fix this just for
making the description more accurate.

Thank you for the patch!

+1 to fix it.
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
+    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
+    This should not lead to problems in the event of a <command>COPY
How about the followings for consistency with the description of the
ON_ERROR option?

COPY stops operation at the first error if the stop value is specified
to the ON_ERROR option. This should not lead to ...
Thank you for you review. However, after posting the previous patch,
I noticed that I overlooked ON_ERROR works only for errors due to data
type incompatibility (that is mentioned as "malformed data" in the
ON_ERROR description, though).

Right.

So, how about rewriting this like:

COPY stops operation at the first error unless the error is due to data
type incompatibility and a value other than stop is specified to the
ON_ERROR option.

Hmm, this sentence seems not very readable to me, especially the "COPY
stops ... unless ... a value other than stop is specified ..." part. I
think we can simplify it:

COPY stops operation at the first data type incompatibility error if
the stop value is specified to the ON_ERROR option.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

David G. Johnston

david.g.johnston@gmail.com

almost 2 years ago

In reply to: Yugo NAGATA (#3)

Re: Small fix on COPY ON_ERROR document

On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp>

wrote:

Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I'm finding the current wording surrounding ON_ERROR to not fit in with the
rest of the page. I've tried to improve things by being a bit more
invasive.

This first hunk updates the description of the COPY command to include
describing the purpose of the ON_ERROR clause.

I too am concerned about the word "parsed" here, and "malformed" in the
more detailed descriptions; this would need to be modified to better
reflect the circumstances under which ignore happens.

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..5fe4c9f747 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,14 @@ COPY { <replaceable
class="parameter">table_name</replaceable> [ ( <replaceable
     in the <structname>pg_stat_progress_copy</structname> view. See
     <xref linkend="copy-progress-reporting"/> for details.
   </para>
+
+  <para>
+    By default, <command>COPY</command> will abort if it encounters errors.
+    For non-binary format <command>COPY FROM</command> the user can specify
+    to instead ignore input rows that cannot be parsed by specifying
+    the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
+    clause.
+  </para>
  </refsect1>

The use of the word "Currently" stood out to me when reading the next
hunk. We always document current reality and don't call out that fact. We
do not imply some future feature may change how this all works. On a
related note, right now we have stop and ignore, which are basically enums
just like we have for format (csv, text, binary). Unlike format, though,
this option requires single quotes. Why is that? I did keep with not
specifying the enum in the main syntax block since format doesn't as well;
though writing { stop | ignore } seemed quite appealing.

The rest of the page indicates what the default is in a sentence by itself,
not as a parenthetical next to the option.

The command limitations seemed worthy of a separate paragraph, though it
repeats the description which isn't great. I'm going to sleep on this one.

@@ -379,17 +387,16 @@ COPY { <replaceable
class="parameter">table_name</replaceable> [ ( <replaceable
     <listitem>
      <para>
       Specifies which <replaceable class="parameter">
-      error_action</replaceable> to perform when there is malformed data
in the input.
-      Currently, only <literal>stop</literal> (default) and
<literal>ignore</literal>
-      values are supported.
-      If the <literal>stop</literal> value is specified,
-      <command>COPY</command> stops operation at the first error.
-      If the <literal>ignore</literal> value is specified,
-      <command>COPY</command> skips malformed data and continues copying
data.
-      The option is allowed only in <command>COPY FROM</command>.
-      Only <literal>stop</literal> value is allowed when
-      using <literal>binary</literal> format.
+      error_action</replaceable> to perform when encountering a malformed
input row:
+      <literal>stop</literal> or <literal>ignore</literal>.
+      A value of <literal>stop</literal> means fail the command, while
+      <literal>ignore</literal> means discard the input row and continue
with the next one.
+      The default is <literal>stop</literal>
      </para>
+     <para>
+      The ignore option is applicable only for <command>COPY FROM</command>
+      when the <literal>FORMAT</literal> is <literal>text</literal>
+      or <literal>csv</literal>.
+    </para>
     </listitem>
    </varlistentry>

The following was, IMO, too much text about something not to worry about,
COPY TO and ignore mode. The point is dead tuples on error and running
vacuum. It doesn't really even matter what caused the error, though if the
error was even before rows started to be imported then obviously there
would be no dead tuples.

Oh, and the word "first" really isn't adding anything here that we cannot
reasonably leave to common sense, IMO. We either ignore all errors or stop
on the first one. There isn't a stop mode that is capable of bypassing
errors and the ignore mode doesn't have a "first n" option so one assumes
all errors are ignored.

@@ -576,15 +583,12 @@ COPY <replaceable
class="parameter">count</replaceable>
</para>

    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
-    amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    A failed <command>COPY FROM</command> command will have performed
+    physical insertion of all rows prior to the malformed one.
+    While these rows will not be visible or accessible, they still occupy
+    disk space. This might amount to a considerable amount of wasted disk
+    space if the failure happened well into a large copy operation.
+    <command>VACUUM</command> should be used to recover the wasted space.
    </para>

<para>

David J.

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: Masahiko Sawada (#4)

Re: Small fix on COPY ON_ERROR document

On Fri, 26 Jan 2024 15:26:55 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 26, 2024 at 2:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I attached a very small patch to fix this just for
making the description more accurate.

Thank you for the patch!

+1 to fix it.
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
+    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
+    This should not lead to problems in the event of a <command>COPY
How about the followings for consistency with the description of the
ON_ERROR option?

COPY stops operation at the first error if the stop value is specified
to the ON_ERROR option. This should not lead to ...
Thank you for you review. However, after posting the previous patch,
I noticed that I overlooked ON_ERROR works only for errors due to data
type incompatibility (that is mentioned as "malformed data" in the
ON_ERROR description, though).
Right.

So, how about rewriting this like:

COPY stops operation at the first error unless the error is due to data
type incompatibility and a value other than stop is specified to the
ON_ERROR option.

Hmm, this sentence seems not very readable to me, especially the "COPY
stops ... unless ... a value other than stop is specified ..." part. I
think we can simplify it:

I can agree with your opinion on the readability, but...

COPY stops operation at the first data type incompatibility error if
the stop value is specified to the ON_ERROR option.

This statement doesn't explain COPY also stops when an error other than
data type incompatibility (e.g. constrain violations) occurs.

Maybe, we can separate the sentese to two, for example:

COPY stops operation at the first error. (The exception is if the error
is due to data type incompatibility and a value other than stop is specified
to the ON_ERROR option.)

Regards,
Yugo Nagata

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

--
Yugo NAGATA <nagata@sraoss.co.jp>

David G. Johnston

david.g.johnston@gmail.com

almost 2 years ago

In reply to: Yugo NAGATA (#6)

Re: Small fix on COPY ON_ERROR document

On Thursday, January 25, 2024, Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Maybe, we can separate the sentese to two, for example:

COPY stops operation at the first error. (The exception is if the error
is due to data type incompatibility and a value other than stop is
specified
to the ON_ERROR option.)

I’d lean more toward saying:

Copy from can be instructed to ignore errors that arise from casting input
data to the data types of the target columns by setting the on_error option
to ignore. Skipping the entire input row in the process.

The last part is because another way to ignore would be to set null values
for those columns.

That a command stops on an error is the assumed behavior throughout the
system, writing “copy stops operation at the first error.” just seems
really unnecessary.

I will need to make this tweak and probably a couple others to my own
suggestions in 12 hours or so.

David J.

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: David G. Johnston (#5)

Re: Small fix on COPY ON_ERROR document

On Thu, 25 Jan 2024 23:33:22 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp>

wrote:

Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I'm finding the current wording surrounding ON_ERROR to not fit in with the
rest of the page. I've tried to improve things by being a bit more
invasive.

Introducing the ON_ERROR option changed the behavior of COPY about error
handling to some extent, so it might be good to rewrite a bit more parts
of the documentation as you suggest.

This first hunk updates the description of the COPY command to include
describing the purpose of the ON_ERROR clause.

I too am concerned about the word "parsed" here, and "malformed" in the
more detailed descriptions; this would need to be modified to better
reflect the circumstances under which ignore happens.

I think "errors due to data type incompatibility" would be nice to describe
the errors to be ignored by ON_ERROR, as used in the NOTICE message output.

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..5fe4c9f747 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,14 @@ COPY { <replaceable
class="parameter">table_name</replaceable> [ ( <replaceable
in the <structname>pg_stat_progress_copy</structname> view. See
<xref linkend="copy-progress-reporting"/> for details.
</para>
+
+  <para>
+    By default, <command>COPY</command> will abort if it encounters errors.
+    For non-binary format <command>COPY FROM</command> the user can specify
+    to instead ignore input rows that cannot be parsed by specifying
+    the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
+    clause.
+  </para>
</refsect1>

The following was, IMO, too much text about something not to worry about,
COPY TO and ignore mode. The point is dead tuples on error and running
vacuum. It doesn't really even matter what caused the error, though if the
error was even before rows started to be imported then obviously there
would be no dead tuples.

Oh, and the word "first" really isn't adding anything here that we cannot
reasonably leave to common sense, IMO. We either ignore all errors or stop
on the first one. There isn't a stop mode that is capable of bypassing
errors and the ignore mode doesn't have a "first n" option so one assumes
all errors are ignored.

@@ -576,15 +583,12 @@ COPY <replaceable
class="parameter">count</replaceable>
</para>
<para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
-    amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    A failed <command>COPY FROM</command> command will have performed
+    physical insertion of all rows prior to the malformed one.

As you said that it does not matter what caused the error, I don't think
we have to use "malformed" here. Instead, we could say "we will have
performed physical insertion of rows before the failure."

+    While these rows will not be visible or accessible, they still occupy
+    disk space. This might amount to a considerable amount of wasted disk
+    space if the failure happened well into a large copy operation.
+    <command>VACUUM</command> should be used to recover the wasted space.
</para>

<para>

David J.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: David G. Johnston (#7)

Re: Small fix on COPY ON_ERROR document

On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

On Thursday, January 25, 2024, Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Maybe, we can separate the sentese to two, for example:

COPY stops operation at the first error. (The exception is if the error
is due to data type incompatibility and a value other than stop is
specified
to the ON_ERROR option.)

I’d lean more toward saying:

Copy from can be instructed to ignore errors that arise from casting input
data to the data types of the target columns by setting the on_error option
to ignore. Skipping the entire input row in the process.

The last part is because another way to ignore would be to set null values
for those columns.

That makes sense. Is is a bit ambiguous to just say "skips malformed data";
it might be unclear for users if the data in the problematic column is skipped
(NULL is set) or the entire row is skipped. Also, setting null values for those
columns could be a future feature of ON_ERROR option.

That a command stops on an error is the assumed behavior throughout the
system, writing “copy stops operation at the first error.” just seems
really unnecessary.

I think we need to describe this default behavior explicitly somewhere,
as you suggested in the other post [1]/messages/by-id/CAKFQuwZJZ6uaS2B7qpL2FJzWBsnDdzgtbsW3pH9zuT6vC3fH+g@mail.gmail.com.

[1]: /messages/by-id/CAKFQuwZJZ6uaS2B7qpL2FJzWBsnDdzgtbsW3pH9zuT6vC3fH+g@mail.gmail.com

Regards,
Yugo Nagata

I will need to make this tweak and probably a couple others to my own
suggestions in 12 hours or so.

David J.

--
Yugo NAGATA <nagata@sraoss.co.jp>

#10

David G. Johnston

david.g.johnston@gmail.com

almost 2 years ago

In reply to: Yugo NAGATA (#9)

1 attachment(s)

Re: Small fix on COPY ON_ERROR document

On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

I will need to make this tweak and probably a couple others to my own
suggestions in 12 hours or so.

And here is my v2.

Notably I choose to introduce the verbiage "soft error" and then define in
the ON_ERROR clause the specific soft error that matters here - "invalid
input syntax".

I also note the log message behavior when ignore mode is chosen. I haven't
confirmed that it is accurate but that is readily tweaked if approved of.

David J.

Attachments:

0001-v2-Improve-ON_ERROR-verbiage-in-COPY-documentation.patchtext/x-patch; charset=US-ASCII; name=0001-v2-Improve-ON_ERROR-verbiage-in-COPY-documentation.patchDownload

From 2d656fe0a69ea1349472d88794c18f8a2e2d37e9 Mon Sep 17 00:00:00 2001
From: "David G. Johnston" <david.g.johnston@gmail.com>
Date: Thu, 25 Jan 2024 23:35:44 -0700
Subject: [PATCH] v2 Improve ON_ERROR verbiage in COPY documentation

---
 doc/src/sgml/ref/copy.sgml | 48 +++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..38ce0b0a4c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     in the <structname>pg_stat_progress_copy</structname> view. See
     <xref linkend="copy-progress-reporting"/> for details.
   </para>
+
+  <para>
+    By default, <command>COPY</command> will fail if it encounters an error
+    during processing. For use cases where a best-effort attempt at loading
+    the entire file is desired, the <literal>ON_ERROR</literal> clause can
+    be used to specify some other behavior.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -378,17 +385,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     <term><literal>ON_ERROR</literal></term>
     <listitem>
      <para>
-      Specifies which <replaceable class="parameter">
-      error_action</replaceable> to perform when there is malformed data in the input.
-      Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
-      values are supported.
-      If the <literal>stop</literal> value is specified,
-      <command>COPY</command> stops operation at the first error.
-      If the <literal>ignore</literal> value is specified,
-      <command>COPY</command> skips malformed data and continues copying data.
-      The option is allowed only in <command>COPY FROM</command>.
-      Only <literal>stop</literal> value is allowed when
-      using <literal>binary</literal> format.
+      Specifies which how to behave when encountering a soft error.
+      An <replaceable class="parameter">error_action</replaceable> value of
+      <literal>stop</literal> means fail the command, while
+      <literal>ignore</literal> means discard the input row and continue with the next one.
+      The default is <literal>stop</literal>
+     </para>
+     <para>
+      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+     </para>
+     <para>
+      The only relevant soft error is "invalid input syntax", which manifests when attempting
+      to create a column value from the text input.
+     </para>
+     <para>
+      An <literal>INFO</literal> level context message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
      </para>
     </listitem>
    </varlistentry>
@@ -576,15 +589,12 @@ COPY <replaceable class="parameter">count</replaceable>
    </para>
 
    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
+    The <command>COPY FROM</command> command physically inserts input rows
+    into the table as it progresses.  If the command fails these rows are
+    left in a deleted state, still occupying disk space. This might
     amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    happened well into a large copy operation. <command>VACUUM</command>
+    should be used to recover the wasted space.
    </para>
 
    <para>
-- 
2.34.1

#11

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: David G. Johnston (#10)

Re: Small fix on COPY ON_ERROR document

On Fri, 26 Jan 2024 08:04:45 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

I will need to make this tweak and probably a couple others to my own
suggestions in 12 hours or so.

And here is my v2.

Notably I choose to introduce the verbiage "soft error" and then define in
the ON_ERROR clause the specific soft error that matters here - "invalid
input syntax".

I am not sure we should use "soft error" without any explanation
because it seems to me that the meaning of words is unclear for users.

This is explained in src/backend/utils/fmgr/README;

* Some callers of datatype input functions (and in future perhaps
other classes of functions) pass an instance of ErrorSaveContext.
This indicates that the caller wishes to handle "soft" errors without
a transaction-terminating exception being thrown: instead, the callee
should store information about the error cause in the ErrorSaveContext
struct and return a dummy result value.

However, this is not mentioned in the documentation anywhere. So, it
would be hard for users to understand what "soft error" is because
they would not read README.

Also, I think "invalid input syntax" is a bit ambiguous. For example,
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".

I found the description as followings in the documentation in COPY:

The column values themselves are strings (...) acceptable to the input
function, of each attribute's data type.

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
column values unacceptable to the input function of each attribute's
data type."

Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.

I also note the log message behavior when ignore mode is chosen. I haven't
confirmed that it is accurate but that is readily tweaked if approved of.

+      An <literal>INFO</literal> level context message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.

The log level is NOTICE not INFO.

    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
+    The <command>COPY FROM</command> command physically inserts input rows
+    into the table as it progresses.  If the command fails these rows are
+    left in a deleted state, still occupying disk space. This might
     amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    happened well into a large copy operation. <command>VACUUM</command>
+    should be used to recover the wasted space.
    </para>

I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#12

torikoshia

torikoshia@oss.nttdata.com

almost 2 years ago

In reply to: David G. Johnston (#10)

Re: Small fix on COPY ON_ERROR document

On 2024-01-27 00:04, David G. Johnston wrote:

On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp>
wrote:

On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

I will need to make this tweak and probably a couple others to my

own

suggestions in 12 hours or so.

And here is my v2.

Notably I choose to introduce the verbiage "soft error" and then
define in the ON_ERROR clause the specific soft error that matters
here - "invalid input syntax".

I also note the log message behavior when ignore mode is chosen. I
haven't confirmed that it is accurate but that is readily tweaked if
approved of.

David J.

Thanks for refining the doc.

+ Specifies which how to behave when encountering a soft error.

To be consistent with other parts in the manual[1]https://www.postgresql.org/docs/devel/sql-createtype.html[2]https://www.postgresql.org/docs/devel/functions-info.html, should be “soft”
error?

+      An <replaceable class="parameter">error_action</replaceable> 
value of
+      <literal>stop</literal> means fail the command, while
+      <literal>ignore</literal> means discard the input row and 
continue with the next one.
+      The default is <literal>stop</literal>

Is "." required at the end of the line?

+     <para>
+      The only relevant soft error is "invalid input syntax", which 
manifests when attempting
+      to create a column value from the text input.
+     </para>

I think it is not restricted to "invalid input syntax".
We can handle out of range error:

=# create table t1(i int);
CREATE TABLE

=# copy t1 from stdin with(ON_ERROR ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF
signal.

111111111111111111111
\.

NOTICE: 1 row was skipped due to data type incompatibility
COPY 0

Also, I'm a little concerned that users might wonder what soft error is.

Certainly there are already references to "soft" errors in the manual,
but they seem to be for developer, such as creating new TYPE for
PostgreSQL.

It might be better to describe what soft error is like below:

-- src/backend/utils/fmgr/README
An error reported "softly" must be safe, in the sense that there is
no question about our ability to continue normal processing of the
transaction.

[1]: https://www.postgresql.org/docs/devel/sql-createtype.html
[2]: https://www.postgresql.org/docs/devel/functions-info.html

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

#13

David G. Johnston

david.g.johnston@gmail.com

almost 2 years ago

In reply to: Yugo NAGATA (#11)

On Sunday, January 28, 2024, Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Fri, 26 Jan 2024 08:04:45 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

I will need to make this tweak and probably a couple others to my own
suggestions in 12 hours or so.

And here is my v2.

Notably I choose to introduce the verbiage "soft error" and then define

in

the ON_ERROR clause the specific soft error that matters here - "invalid
input syntax".

I am not sure we should use "soft error" without any explanation
because it seems to me that the meaning of words is unclear for users.

Agreed. It needs to be added to the glossary.

Also, I think "invalid input syntax" is a bit ambiguous. For example,
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".

It is literally the error text that appears if one were not to ignore it.
It isn’t a category of errors. But I’m open to ideas here. But being
explicit with what on actually sees in the system seemed preferable to
inventing new classification terms not otherwise used.

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
column values unacceptable to the input function of each attribute's
data type."

Yeah, I was considering something along those lines as an option as well.
But I’d rather add that wording to the glossary.

Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.

Good point. Seems we should define what user-facing errors are ignored
anywhere in the system and if we aren’t consistently leveraging these in
all areas/commands make the necessary qualifications in those specific
places.

I also note the log message behavior when ignore mode is chosen. I

haven't

confirmed that it is accurate but that is readily tweaked if approved of.
+      An <literal>INFO</literal> level context message containing the
ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least
one row was discarded.
The log level is NOTICE not INFO.

Makes sense, I hadn’t experimented.

I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Just visible then, I don’t like an “or” there and as tuples at least they
are accessible to the system, in vacuum especially. But I expected the
user to understand “as if you deleted it” as their operational concept more
readily than visible. I think this will be read by people who haven’t read
MVCC to fully understand what visible means but know enough to run vacuum
to clean up updated and deleted data as a rule.

David J.

#14

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: David G. Johnston (#13)

Re: Small fix on COPY ON_ERROR document

On Sun, 28 Jan 2024 19:14:58 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

Also, I think "invalid input syntax" is a bit ambiguous. For example,
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".

It is literally the error text that appears if one were not to ignore it.
It isn’t a category of errors. But I’m open to ideas here. But being
explicit with what on actually sees in the system seemed preferable to
inventing new classification terms not otherwise used.

Thank you for explanation! I understood the words was from the error messages
that users actually see. However, as Torikoshi-san said in [1]/messages/by-id/7f1457497fa3bf9dfe486f162d1c8ec6@oss.nttdata.com, errors other
than valid input syntax (e.g. range error) can be also ignored, therefore it
would be better to describe to be ignored errors more specifically.

[1]: /messages/by-id/7f1457497fa3bf9dfe486f162d1c8ec6@oss.nttdata.com

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
column values unacceptable to the input function of each attribute's
data type."

Yeah, I was considering something along those lines as an option as well.
But I’d rather add that wording to the glossary.

Although I am still be not convinced if we have to introduce the words
"soft error" to the documentation, I don't care it if there are no other
opposite opinions.

Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.

Good point. Seems we should define what user-facing errors are ignored
anywhere in the system and if we aren’t consistently leveraging these in
all areas/commands make the necessary qualifications in those specific
places.

I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Just visible then, I don’t like an “or” there and as tuples at least they
are accessible to the system, in vacuum especially. But I expected the
user to understand “as if you deleted it” as their operational concept more
readily than visible. I think this will be read by people who haven’t read
MVCC to fully understand what visible means but know enough to run vacuum
to clean up updated and deleted data as a rule.

Ok, I agree we can omit "or accessible". How do you like the followings?
Still redundant?

"If the command fails, these rows are left in a deleted state;
these rows will not be visible, but they still occupy disk space. "

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#15

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: Yugo NAGATA (#14)

1 attachment(s)

Re: Small fix on COPY ON_ERROR document

On Mon, 29 Jan 2024 15:47:25 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Sun, 28 Jan 2024 19:14:58 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

Also, I think "invalid input syntax" is a bit ambiguous. For example,
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".

It is literally the error text that appears if one were not to ignore it.
It isn’t a category of errors. But I’m open to ideas here. But being
explicit with what on actually sees in the system seemed preferable to
inventing new classification terms not otherwise used.

Thank you for explanation! I understood the words was from the error messages
that users actually see. However, as Torikoshi-san said in [1], errors other
than valid input syntax (e.g. range error) can be also ignored, therefore it
would be better to describe to be ignored errors more specifically.

[1] /messages/by-id/7f1457497fa3bf9dfe486f162d1c8ec6@oss.nttdata.com

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
column values unacceptable to the input function of each attribute's
data type."

Yeah, I was considering something along those lines as an option as well.
But I’d rather add that wording to the glossary.

Although I am still be not convinced if we have to introduce the words
"soft error" to the documentation, I don't care it if there are no other
opposite opinions.

Attached is a updated patch v3, which is a version that uses the above
wording instead of "soft error".

Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.

Good point. Seems we should define what user-facing errors are ignored
anywhere in the system and if we aren’t consistently leveraging these in
all areas/commands make the necessary qualifications in those specific
places.

I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Just visible then, I don’t like an “or” there and as tuples at least they
are accessible to the system, in vacuum especially. But I expected the
user to understand “as if you deleted it” as their operational concept more
readily than visible. I think this will be read by people who haven’t read
MVCC to fully understand what visible means but know enough to run vacuum
to clean up updated and deleted data as a rule.

Ok, I agree we can omit "or accessible". How do you like the followings?
Still redundant?

"If the command fails, these rows are left in a deleted state;
these rows will not be visible, but they still occupy disk space. "

Also, the above statement is used in the patch.

Regards,
Yugo Nagata

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v3_improve_ON_ERROR_option_in_COPY_doc.patchtext/x-diff; name=v3_improve_ON_ERROR_option_in_COPY_doc.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..10cfc3f0ad 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     in the <structname>pg_stat_progress_copy</structname> view. See
     <xref linkend="copy-progress-reporting"/> for details.
   </para>
+
+  <para>
+    By default, <command>COPY</command> will fail if it encounters an error
+    during processing. For use cases where a best-effort attempt at loading
+    the entire file is desired, the <literal>ON_ERROR</literal> clause can
+    be used to specify some other behavior.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -378,17 +385,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     <term><literal>ON_ERROR</literal></term>
     <listitem>
      <para>
-      Specifies which <replaceable class="parameter">
-      error_action</replaceable> to perform when there is malformed data in the input.
-      Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
-      values are supported.
-      If the <literal>stop</literal> value is specified,
-      <command>COPY</command> stops operation at the first error.
-      If the <literal>ignore</literal> value is specified,
-      <command>COPY</command> skips malformed data and continues copying data.
-      The option is allowed only in <command>COPY FROM</command>.
-      Only <literal>stop</literal> value is allowed when
-      using <literal>binary</literal> format.
+      Specifies which how to behave when encountering an error due to column values
+      unacceptable to the input function of each attribute's data type.
+      An <replaceable class="parameter">error_action</replaceable> value of
+      <literal>stop</literal> means fail the command, while
+      <literal>ignore</literal> means discard the input row and continue with the next one.
+      The default is <literal>stop</literal>
+     </para>
+     <para>
+      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+     </para>
+     <para>
+      An <literal>NOTICE</literal> level context message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
      </para>
     </listitem>
    </varlistentry>
@@ -576,15 +586,13 @@ COPY <replaceable class="parameter">count</replaceable>
    </para>
 
    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
-    amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    The <command>COPY FROM</command> command physically inserts input rows
+    into the table as it progresses.  If the command fails, these rows are
+    left in a deleted state; these rows will not be visible, but still
+    occupying disk space. This might amount to a considerable amount of
+    wasted disk space if the failure happened well into a large copy
+    operation. <command>VACUUM</command> should be used to recover the
+    wasted space.
    </para>
 
    <para>

#16

torikoshia

torikoshia@oss.nttdata.com

almost 2 years ago

In reply to: Yugo NAGATA (#15)

Re: Small fix on COPY ON_ERROR document

On 2024-02-01 15:16, Yugo NAGATA wrote:

On Mon, 29 Jan 2024 15:47:25 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Sun, 28 Jan 2024 19:14:58 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

Also, I think "invalid input syntax" is a bit ambiguous. For example,
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".

It is literally the error text that appears if one were not to ignore it.
It isn’t a category of errors. But I’m open to ideas here. But being
explicit with what on actually sees in the system seemed preferable to
inventing new classification terms not otherwise used.

Thank you for explanation! I understood the words was from the error
messages
that users actually see. However, as Torikoshi-san said in [1], errors
other
than valid input syntax (e.g. range error) can be also ignored,
therefore it
would be better to describe to be ignored errors more specifically.

[1]
/messages/by-id/7f1457497fa3bf9dfe486f162d1c8ec6@oss.nttdata.com

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
column values unacceptable to the input function of each attribute's
data type."

Yeah, I was considering something along those lines as an option as well.
But I’d rather add that wording to the glossary.

Although I am still be not convinced if we have to introduce the words
"soft error" to the documentation, I don't care it if there are no
other
opposite opinions.

Attached is a updated patch v3, which is a version that uses the above
wording instead of "soft error".

Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.

Good point. Seems we should define what user-facing errors are ignored
anywhere in the system and if we aren’t consistently leveraging these in
all areas/commands make the necessary qualifications in those specific
places.

I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Just visible then, I don’t like an “or” there and as tuples at least they
are accessible to the system, in vacuum especially. But I expected the
user to understand “as if you deleted it” as their operational concept more
readily than visible. I think this will be read by people who haven’t read
MVCC to fully understand what visible means but know enough to run vacuum
to clean up updated and deleted data as a rule.

Ok, I agree we can omit "or accessible". How do you like the
followings?
Still redundant?

"If the command fails, these rows are left in a deleted state;
these rows will not be visible, but they still occupy disk space. "

Also, the above statement is used in the patch.

Thanks for updating the patch!

I like your description which doesn't use the word soft error.

Here are minor comments:

+      <literal>ignore</literal> means discard the input row and 
continue with the next one.
+      The default is <literal>stop</literal>

Is "." required at the end of the line?

An <literal>NOTICE</literal> level context message containing the
ignored row count is

Should 'An' be 'A'?

Also, I wasn't sure the necessity of 'context'.
It might be possible to just say "A NOTICE message containing the
ignored row count.."
considering below existing descriptions:

doc/src/sgml/pltcl.sgml: a <literal>NOTICE</literal> message each
time a supported command is
doc/src/sgml/pltcl.sgml- executed:

doc/src/sgml/plpgsql.sgml: This example trigger simply raises a
<literal>NOTICE</literal> message
doc/src/sgml/plpgsql.sgml- each time a supported command is
executed.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

#17

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: torikoshia (#16)

1 attachment(s)

Re: Small fix on COPY ON_ERROR document

On Fri, 02 Feb 2024 11:29:41 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-02-01 15:16, Yugo NAGATA wrote:

On Mon, 29 Jan 2024 15:47:25 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Sun, 28 Jan 2024 19:14:58 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

Also, I think "invalid input syntax" is a bit ambiguous. For example,
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".

It is literally the error text that appears if one were not to ignore it.
It isn’t a category of errors. But I’m open to ideas here. But being
explicit with what on actually sees in the system seemed preferable to
inventing new classification terms not otherwise used.

Thank you for explanation! I understood the words was from the error
messages
that users actually see. However, as Torikoshi-san said in [1], errors
other
than valid input syntax (e.g. range error) can be also ignored,
therefore it
would be better to describe to be ignored errors more specifically.

[1]
/messages/by-id/7f1457497fa3bf9dfe486f162d1c8ec6@oss.nttdata.com

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
column values unacceptable to the input function of each attribute's
data type."

Yeah, I was considering something along those lines as an option as well.
But I’d rather add that wording to the glossary.

Although I am still be not convinced if we have to introduce the words
"soft error" to the documentation, I don't care it if there are no
other
opposite opinions.

Attached is a updated patch v3, which is a version that uses the above
wording instead of "soft error".

Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.

Good point. Seems we should define what user-facing errors are ignored
anywhere in the system and if we aren’t consistently leveraging these in
all areas/commands make the necessary qualifications in those specific
places.

I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Just visible then, I don’t like an “or” there and as tuples at least they
are accessible to the system, in vacuum especially. But I expected the
user to understand “as if you deleted it” as their operational concept more
readily than visible. I think this will be read by people who haven’t read
MVCC to fully understand what visible means but know enough to run vacuum
to clean up updated and deleted data as a rule.

Ok, I agree we can omit "or accessible". How do you like the
followings?
Still redundant?

"If the command fails, these rows are left in a deleted state;
these rows will not be visible, but they still occupy disk space. "

Also, the above statement is used in the patch.

Thanks for updating the patch!

I like your description which doesn't use the word soft error.

Thank you for your comments!

Here are minor comments:
+      <literal>ignore</literal> means discard the input row and 
continue with the next one.
+      The default is <literal>stop</literal>
Is "." required at the end of the line?

An <literal>NOTICE</literal> level context message containing the
ignored row count is

Should 'An' be 'A'?

Also, I wasn't sure the necessity of 'context'.
It might be possible to just say "A NOTICE message containing the
ignored row count.."
considering below existing descriptions:

doc/src/sgml/pltcl.sgml: a <literal>NOTICE</literal> message each
time a supported command is
doc/src/sgml/pltcl.sgml- executed:

doc/src/sgml/plpgsql.sgml: This example trigger simply raises a
<literal>NOTICE</literal> message
doc/src/sgml/plpgsql.sgml- each time a supported command is
executed.

I attached a updated patch including fixes you pointed out above.

Regards,
Yugo Nagata

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v4_improve_ON_ERROR_option_in_COPY_doc.patchtext/x-diff; name=v4_improve_ON_ERROR_option_in_COPY_doc.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..3c2feaa11a 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     in the <structname>pg_stat_progress_copy</structname> view. See
     <xref linkend="copy-progress-reporting"/> for details.
   </para>
+
+  <para>
+    By default, <command>COPY</command> will fail if it encounters an error
+    during processing. For use cases where a best-effort attempt at loading
+    the entire file is desired, the <literal>ON_ERROR</literal> clause can
+    be used to specify some other behavior.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -378,17 +385,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     <term><literal>ON_ERROR</literal></term>
     <listitem>
      <para>
-      Specifies which <replaceable class="parameter">
-      error_action</replaceable> to perform when there is malformed data in the input.
-      Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
-      values are supported.
-      If the <literal>stop</literal> value is specified,
-      <command>COPY</command> stops operation at the first error.
-      If the <literal>ignore</literal> value is specified,
-      <command>COPY</command> skips malformed data and continues copying data.
-      The option is allowed only in <command>COPY FROM</command>.
-      Only <literal>stop</literal> value is allowed when
-      using <literal>binary</literal> format.
+      Specifies which how to behave when encountering an error due to column values
+      unacceptable to the input function of each attribute's data type.
+      An <replaceable class="parameter">error_action</replaceable> value of
+      <literal>stop</literal> means fail the command, while
+      <literal>ignore</literal> means discard the input row and continue with the next one.
+      The default is <literal>stop</literal>.
+     </para>
+     <para>
+      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+     </para>
+     <para>
+      A <literal>NOTICE</literal> message containing the ignored row count is emitted at the end
+      of the <command>COPY FROM</command> if at least one row was discarded.
      </para>
     </listitem>
    </varlistentry>
@@ -576,15 +586,13 @@ COPY <replaceable class="parameter">count</replaceable>
    </para>
 
    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
-    amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    The <command>COPY FROM</command> command physically inserts input rows
+    into the table as it progresses.  If the command fails, these rows are
+    left in a deleted state; these rows will not be visible, but still
+    occupying disk space. This might amount to a considerable amount of
+    wasted disk space if the failure happened well into a large copy
+    operation. <command>VACUUM</command> should be used to recover the
+    wasted space.
    </para>
 
    <para>

#18

David G. Johnston

david.g.johnston@gmail.com

almost 2 years ago

In reply to: Yugo NAGATA (#17)

Re: Small fix on COPY ON_ERROR document

On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

I attached a updated patch including fixes you pointed out above.

Removed "which"; changed "occupying" to "occupy"
Removed on of the two "amounts"
Changed "unacceptable to the input function" to just "converting" as that
is what the average reader is more likely to be thinking.
The rest was good.

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 3c2feaa11a..55764fc1f2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -385,8 +385,8 @@ COPY { <replaceable
class="parameter">table_name</replaceable> [ ( <replaceable
     <term><literal>ON_ERROR</literal></term>
     <listitem>
      <para>
-      Specifies which how to behave when encountering an error due to
column values
-      unacceptable to the input function of each attribute's data type.
+      Specifies how to behave when encountering an error converting a
column's
+      input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
       <literal>stop</literal> means fail the command, while
       <literal>ignore</literal> means discard the input row and continue
with the next one.
@@ -589,7 +589,7 @@ COPY <replaceable class="parameter">count</replaceable>
     The <command>COPY FROM</command> command physically inserts input rows
     into the table as it progresses.  If the command fails, these rows are
     left in a deleted state; these rows will not be visible, but still
-    occupying disk space. This might amount to a considerable amount of
+    occupy disk space. This might amount to considerable
     wasted disk space if the failure happened well into a large copy
     operation. <command>VACUUM</command> should be used to recover the
     wasted space.

David J.

#19

Alexander Korotkov

aekorotkov@gmail.com

almost 2 years ago

In reply to: David G. Johnston (#18)

Re: Small fix on COPY ON_ERROR document

On Fri, Feb 2, 2024 at 10:07 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

I attached a updated patch including fixes you pointed out above.

Removed "which"; changed "occupying" to "occupy"
Removed on of the two "amounts"
Changed "unacceptable to the input function" to just "converting" as that is what the average reader is more likely to be thinking.
The rest was good.

Thanks to everybody in the thread.
Pushed.

------
Regards,
Alexander Korotkov

#20

Yugo NAGATA

nagata@sraoss.co.jp

almost 2 years ago

In reply to: Alexander Korotkov (#19)

Re: Small fix on COPY ON_ERROR document

On Sat, 3 Feb 2024 01:51:56 +0200
Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Fri, Feb 2, 2024 at 10:07 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

I attached a updated patch including fixes you pointed out above.

Removed "which"; changed "occupying" to "occupy"
Removed on of the two "amounts"
Changed "unacceptable to the input function" to just "converting" as that is what the average reader is more likely to be thinking.
The rest was good.

Thanks to everybody in the thread.
Pushed.

Thanks!

------
Regards,
Alexander Korotkov

--
Yugo NAGATA <nagata@sraoss.co.jp>