Add on_error and log_verbosity options to file_fdw
Hi,
With the current file_fdw, if even one line of data conversion fails,
the contents of the file cannot be referenced at all:
=# \! cat data/test.data
1,a
2,b
a,c
=# create foreign table f_fdw_test_1 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv');
CREATE FOREIGN TABLE
=# table f_fdw_test_1;
ERROR: invalid input syntax for type integer: "a"
CONTEXT: COPY f_fdw_test, line 3, column i: "a"
Since we'll support ON_ERROR option which tolerates data conversion
errors in COPY FROM and LOG_VERBOSITY option at v17[1]https://www.postgresql.org/docs/devel/sql-copy.html, how about
supporting them on file_fdw?
This idea comes from Fujii-san[2]https://x.com/fujii_masao/status/1808178032219509041, and I think it'd be useful when
reading a bit dirty data.
Attached PoC patch works like below:
=# create foreign table f_fdw_test_2 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv', on_error 'ignore');
CREATE FOREIGN TABLE
=# table f_fdw_test_2;
NOTICE: 1 row was skipped due to data type incompatibility
i | t
---+---
1 | a
2 | b
(2 rows)
=# create foreign table f_fdw_test_3 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv', on_error 'ignore',
log_verbosity 'verbose');
CREATE FOREIGN TABLE
=# table f_fdw_test_3 ;
NOTICE: skipping row due to data type incompatibility at line 3 for
column i: "a"
NOTICE: 1 row was skipped due to data type incompatibility
i | t
---+---
1 | a
2 | b
(2 rows)
I'm going to continue developing the patch(e.g. add doc, measure
performance degradation) when people also think this feature is worth
adding.
What do you think?
[1]: https://www.postgresql.org/docs/devel/sql-copy.html
[2]: https://x.com/fujii_masao/status/1808178032219509041
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v1-0001-PoC-patch-for-adding-on_error-and-log_verbosity-o.patchtext/x-diff; name=v1-0001-PoC-patch-for-adding-on_error-and-log_verbosity-o.patchDownload
From b6ec598bfdd64833e0bffc889a11addc5d677b51 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Fri, 5 Jul 2024 00:07:26 +0900
Subject: [PATCH v1] PoC patch for adding on_error and log_verbosity options to
file_fdw
---
contrib/file_fdw/expected/file_fdw.out | 20 +++++++++
contrib/file_fdw/file_fdw.c | 58 +++++++++++++++++++++++---
contrib/file_fdw/sql/file_fdw.sql | 6 +++
3 files changed, 78 insertions(+), 6 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..1af79af20f 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,26 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'verbose');
+SELECT * FROM agg_bad;
+NOTICE: skipping row due to data type incompatibility at line 3 for column b: "aaa"
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a0..86fb655df1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,8 +22,10 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+#include "commands/progress.h"
#include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
@@ -34,6 +36,7 @@
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
#include "utils/acl.h"
+#include "utils/backend_progress.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/sampling.h"
@@ -74,6 +77,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -724,12 +729,13 @@ fileIterateForeignScan(ForeignScanState *node)
ExprContext *econtext;
MemoryContext oldcontext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- bool found;
+ CopyFromState cstate = festate->cstate;
+ int64 skipped = 0;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
@@ -750,10 +756,40 @@ fileIterateForeignScan(ForeignScanState *node)
* switch in case we are doing that.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+
+ for(;;)
+ {
+ if (!NextCopyFrom(cstate, econtext,
+ slot->tts_values, slot->tts_isnull))
+ break;
+
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and deal with error
+ * information according to ON_ERROR.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ ++skipped);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
ExecStoreVirtualTuple(slot);
+ break;
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -795,7 +831,17 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ festate->cstate->num_errors > 0)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
EndCopyFrom(festate->cstate);
}
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..5eae01d0f2 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,12 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'verbose');
+SELECT * FROM agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
base-commit: 2ef575c7803a55101352c0f6cb8f745af063a66c
--
2.39.2
On 2024-07-05 00:27, torikoshia wrote:
Hi,
With the current file_fdw, if even one line of data conversion fails,
the contents of the file cannot be referenced at all:=# \! cat data/test.data
1,a
2,b
a,c
=# create foreign table f_fdw_test_1 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv');
CREATE FOREIGN TABLE=# table f_fdw_test_1;
ERROR: invalid input syntax for type integer: "a"
CONTEXT: COPY f_fdw_test, line 3, column i: "a"Since we'll support ON_ERROR option which tolerates data conversion
errors in COPY FROM and LOG_VERBOSITY option at v17[1], how about
supporting them on file_fdw?This idea comes from Fujii-san[2], and I think it'd be useful when
reading a bit dirty data.Attached PoC patch works like below:
=# create foreign table f_fdw_test_2 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv', on_error 'ignore');
CREATE FOREIGN TABLE=# table f_fdw_test_2;
NOTICE: 1 row was skipped due to data type incompatibility
i | t
---+---
1 | a
2 | b
(2 rows)=# create foreign table f_fdw_test_3 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv', on_error 'ignore',
log_verbosity 'verbose');
CREATE FOREIGN TABLE=# table f_fdw_test_3 ;
NOTICE: skipping row due to data type incompatibility at line 3 for
column i: "a"
NOTICE: 1 row was skipped due to data type incompatibility
i | t
---+---
1 | a
2 | b
(2 rows)I'm going to continue developing the patch(e.g. add doc, measure
performance degradation) when people also think this feature is worth
adding.What do you think?
[1] https://www.postgresql.org/docs/devel/sql-copy.html
[2] https://x.com/fujii_masao/status/1808178032219509041
Update the patch since v1 patch caused compiler warning.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v2-0001-PoC-patch-for-adding-on_error-and-log_verbosity-o.patchtext/x-diff; name=v2-0001-PoC-patch-for-adding-on_error-and-log_verbosity-o.patchDownload
From b6295590977479ffb601c9848c6194a51d75e932 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Fri, 19 Jul 2024 10:12:20 +0900
Subject: [PATCH v2] Add on_error and log_verbosity options to file_fdw
9e2d870, b725b7e and f5a2278 introduced on_error and log_verbosity to COPY.
Since these options seems useful to file_fdw when accessing dirty data,
this patch adds them to file_fdw.
---
contrib/file_fdw/expected/file_fdw.out | 20 +++++++++
contrib/file_fdw/file_fdw.c | 60 +++++++++++++++++++++++---
contrib/file_fdw/sql/file_fdw.sql | 6 +++
3 files changed, 79 insertions(+), 7 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..1af79af20f 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,26 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'verbose');
+SELECT * FROM agg_bad;
+NOTICE: skipping row due to data type incompatibility at line 3 for column b: "aaa"
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a0..2dc2744393 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,8 +22,10 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+#include "commands/progress.h"
#include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
@@ -34,6 +36,7 @@
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
#include "utils/acl.h"
+#include "utils/backend_progress.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/sampling.h"
@@ -74,6 +77,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -724,12 +729,13 @@ fileIterateForeignScan(ForeignScanState *node)
ExprContext *econtext;
MemoryContext oldcontext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- bool found;
+ CopyFromState cstate = festate->cstate;
+ int64 skipped = 0;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
@@ -750,10 +756,40 @@ fileIterateForeignScan(ForeignScanState *node)
* switch in case we are doing that.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+
+ for(;;)
+ {
+ if (!NextCopyFrom(cstate, econtext,
+ slot->tts_values, slot->tts_isnull))
+ break;
+
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and deal with error
+ * information according to ON_ERROR.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ ++skipped);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
ExecStoreVirtualTuple(slot);
+ break;
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -795,8 +831,18 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
- EndCopyFrom(festate->cstate);
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ festate->cstate->num_errors > 0)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
+ EndCopyFrom(festate->cstate);
}
/*
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..5eae01d0f2 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,12 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'verbose');
+SELECT * FROM agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
--
2.39.2
Hi,
On Thu, Jul 18, 2024 at 6:38 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-07-05 00:27, torikoshia wrote:
Hi,
With the current file_fdw, if even one line of data conversion fails,
the contents of the file cannot be referenced at all:=# \! cat data/test.data
1,a
2,b
a,c
=# create foreign table f_fdw_test_1 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv');
CREATE FOREIGN TABLE=# table f_fdw_test_1;
ERROR: invalid input syntax for type integer: "a"
CONTEXT: COPY f_fdw_test, line 3, column i: "a"Since we'll support ON_ERROR option which tolerates data conversion
errors in COPY FROM and LOG_VERBOSITY option at v17[1], how about
supporting them on file_fdw?
+1
This idea comes from Fujii-san[2], and I think it'd be useful when
reading a bit dirty data.Attached PoC patch works like below:
=# create foreign table f_fdw_test_2 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv', on_error 'ignore');
CREATE FOREIGN TABLE=# table f_fdw_test_2;
NOTICE: 1 row was skipped due to data type incompatibility
i | t
---+---
1 | a
2 | b
(2 rows)
I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via file_fdw
could be frequent. An alternative idea of place to write the
information of the number of malformed rows would be the EXPLAIN
command as follow:
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on public.test (cost=0.00..1.10 rows=1 width=12)
Output: a, b, c
Foreign File: test.csv
Foreign File Size: 12 b
Skipped Rows: 10
=# create foreign table f_fdw_test_3 (i int, t text) server f_fdw
options (filename 'test.data', format 'csv', on_error 'ignore',
log_verbosity 'verbose');
CREATE FOREIGN TABLE=# table f_fdw_test_3 ;
NOTICE: skipping row due to data type incompatibility at line 3 for
column i: "a"
NOTICE: 1 row was skipped due to data type incompatibility
i | t
---+---
1 | a
2 | b
(2 rows)
IIUC we have to execute ALTER FOREIGN TABLE to change the
log_verbosity value and which requires to be the owner. Which seems
not to be user-friendly.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:
I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via file_fdw
could be frequent. An alternative idea of place to write the
information of the number of malformed rows would be the EXPLAIN
command as follow:
Yeah, I also have some concerns regarding the noise that this could
produce if called on a foreign table on a regular basis. The verbose
mode is disabled by default so I don't see why we should not allow it
if the relation owner wants to show it.
Perhaps we should first do a silence mode for log_verbosity to skip
the NOTICE produced at the end of the COPY FROM summarizing the whole?
It would be confusing to have different defaults between COPY and
file_fdw, but having the option to silence that completely is also
appealing from the user point of view.
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on public.test (cost=0.00..1.10 rows=1 width=12)
Output: a, b, c
Foreign File: test.csv
Foreign File Size: 12 b
Skipped Rows: 10
Interesting idea linked to the idea of pushing the error state to
something else than the logs. Sounds like a separate feature.
IIUC we have to execute ALTER FOREIGN TABLE to change the
log_verbosity value and which requires to be the owner. Which seems
not to be user-friendly.
I am not sure about allowing scans to force an option to be a
different thing at runtime vs what's been defined in the relation
itself with CREATE/ALTER.
--
Michael
On 2024-07-23 08:57, Michael Paquier wrote:
On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:
I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via file_fdw
could be frequent.
Agreed.
Yeah, I also have some concerns regarding the noise that this could
produce if called on a foreign table on a regular basis. The verbose
mode is disabled by default so I don't see why we should not allow it
if the relation owner wants to show it.Perhaps we should first do a silence mode for log_verbosity to skip
the NOTICE produced at the end of the COPY FROM summarizing the whole?
I like this idea.
If there are no objections, I'm going to make a patch for this.
It would be confusing to have different defaults between COPY and
file_fdw, but having the option to silence that completely is also
appealing from the user point of view.
I'm not sure we should change the defaults.
If the default of file_fdw is silence mode, I am a little concerned that
there may be cases where people think they have no errors, but in fact
they have.
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on public.test (cost=0.00..1.10 rows=1 width=12)
Output: a, b, c
Foreign File: test.csv
Foreign File Size: 12 b
Skipped Rows: 10Interesting idea linked to the idea of pushing the error state to
something else than the logs. Sounds like a separate feature.
+1
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
On 2024-07-24 19:43, torikoshia wrote:
On 2024-07-23 08:57, Michael Paquier wrote:
On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:
I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via
file_fdw
could be frequent.Agreed.
Yeah, I also have some concerns regarding the noise that this could
produce if called on a foreign table on a regular basis. The verbose
mode is disabled by default so I don't see why we should not allow it
if the relation owner wants to show it.Perhaps we should first do a silence mode for log_verbosity to skip
the NOTICE produced at the end of the COPY FROM summarizing the whole?I like this idea.
If there are no objections, I'm going to make a patch for this.
Attached patches.
0001 adds new option 'silent' to log_verbosity and 0002 adds on_error
and log_verbosity options to file_fdw.
I'm going to continue developing the patch(e.g. add doc, measure
performance degradation) when people also think this feature is worth
adding.
Here is a quick performance test result.
I loaded 1,000,000 rows using pgbench_accounts to a file and counted the
number of rows using file_fdw on different conditions and compared the
execution times on my laptop.
The changed conditions are below:
- source code: HEAD/patch applied
- data: no error data/all row occur data conversion error at the 1st
column
- file_fdw options: on_error=stop/on_error=ignore
There seems no significant difference in performance between HEAD and
the patch applied with on_error option specified as either ignore/stop
when data has no error.
OTOH when all rows occur data conversion error, it is significantly
faster than other cases:
# HAED(e950fe58bd0)
## data:no error
=# create foreign table t1 (a int, b int, c int, t text) server f_fdw
options (filename 'pgb_ac', format 'csv');
=# select count(*) from t1;
1567.569 ms
1675.112 ms
1555.782 ms
1547.676 ms
1660.221 ms
# patch applied
## data:no error, on_error:stop
=# create foreign table t1 (a int, b int, c int, t text) server f_fdw
options (filename 'pgb_ac', format 'csv', on_error 'stop');
=# select count(*) from t1;
1580.656 ms
1623.784 ms
1596.947 ms
1652.307 ms
1613.607 ms
## data:no error, on_error:ignore
=# create foreign table t1 (a int, b int, c int, t text) server f_fdw
options (filename 'pgb_ac', format 'csv', on_error 'ignore');
=# select count(*) from t1;
1575.718 ms
1597.464 ms
1596.540 ms
1665.818 ms
1595.453 ms
#### data:all rows contain error, on_error:ignore
=# create foreign table t1 (a int, b int, c int, t text) server f_fdw
options (filename 'pgb_ac', format 'csv', on_error 'ignore');
=# select count(*) from t1;
914.537 ms
907.506 ms
912.768 ms
913.769 ms
914.327 ms
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v3-0001-Add-log_verbosity-to-silent.patchtext/x-diff; name=v3-0001-Add-log_verbosity-to-silent.patchDownload
From c32e24d09679814469a31beedb99e37c7ad0e21a Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Thu, 8 Aug 2024 15:49:51 +0900
Subject: [PATCH v3 1/2] Add log_verbosity to 'silent'
Currently when on_error is set to ignore, COPY logs a NOTICE message
containing the ignored row count.
When using on_error option to file_fdw, it'd be better to omit this
message in some cases, such as frequent access to a malformed file
via the same foreign table.
As a preliminary step to add on_error option to file_fdw, this patch
adds new value 'silent' to log_verbosity and enables to silence
the message.
---
doc/src/sgml/ref/copy.sgml | 8 ++++++--
src/backend/commands/copy.c | 4 +++-
src/backend/commands/copyfrom.c | 3 ++-
src/include/commands/copy.h | 5 +++--
src/test/regress/expected/copy2.out | 4 +++-
src/test/regress/sql/copy2.sql | 4 ++++
6 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..3dbb1f7453 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<literal>verbose</literal>, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
conversion has failed is emitted for each discarded row.
+ When it is set to <literal>silent</literal>, no message is emitted
+ regarding ignored rows.
</para>
</listitem>
</varlistentry>
@@ -428,8 +430,10 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<listitem>
<para>
Specify the amount of messages emitted by a <command>COPY</command>
- command: <literal>default</literal> or <literal>verbose</literal>. If
- <literal>verbose</literal> is specified, additional messages are emitted
+ command: <literal>silent</literal>, <literal>default</literal>, or
+ <literal>verbose</literal>.
+ <literal>silent</literal> excludes verbose messages.
+ If <literal>verbose</literal> is specified, additional messages are emitted
during processing.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index df7a4a21c9..439bdbd83e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -424,9 +424,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
char *sval;
/*
- * Allow "default", or "verbose" values.
+ * Allow "silent", "default", or "verbose" values.
*/
sval = defGetString(def);
+ if (pg_strcasecmp(sval, "silent") == 0)
+ return COPY_LOG_VERBOSITY_SILENT;
if (pg_strcasecmp(sval, "default") == 0)
return COPY_LOG_VERBOSITY_DEFAULT;
if (pg_strcasecmp(sval, "verbose") == 0)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c42a5621d5..37c5776ed0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1318,7 +1318,8 @@ CopyFrom(CopyFromState cstate)
error_context_stack = errcallback.previous;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0)
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity != COPY_LOG_VERBOSITY_SILENT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..0a7ef34376 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -45,8 +45,9 @@ typedef enum CopyOnErrorChoice
*/
typedef enum CopyLogVerbosityChoice
{
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
- COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+ COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+ COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+ COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 931542f268..be3ddda03a 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column l: null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
SELECT * FROM check_ign_err;
@@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2;
n | m | k | l
---+-----+---+-------
1 | {1} | 1 | 'foo'
-(1 row)
+ 3 | {3} | 3 | 'bar'
+(2 rows)
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..fa6aa17344 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
+3 {3} 3 'bar'
+4 {4} 4 \N
+\.
-- reset context choice
\set SHOW_CONTEXT errors
--
2.39.2
v3-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchtext/x-diff; name=v3-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchDownload
From e950fe58bd043f8e378ff6d75773bb52855504f4 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Thu, 8 Aug 2024 15:51:35 +0900
Subject: [PATCH v3 2/2] Add on_error and log_verbosity options to file_fdw
9e2d870, b725b7e and f5a2278 introduced on_error and log_verbosity to COPY.
Since these options seems useful to file_fdw when accessing dirty data,
this patch adds them to file_fdw.
---
contrib/file_fdw/expected/file_fdw.out | 18 ++++++++
contrib/file_fdw/file_fdw.c | 61 +++++++++++++++++++++++---
contrib/file_fdw/sql/file_fdw.sql | 6 +++
doc/src/sgml/file-fdw.sgml | 23 ++++++++++
4 files changed, 101 insertions(+), 7 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..2ea0deeb10 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,24 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a0..11fc75b4d7 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,8 +22,10 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+#include "commands/progress.h"
#include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
@@ -34,6 +36,7 @@
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
#include "utils/acl.h"
+#include "utils/backend_progress.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/sampling.h"
@@ -74,6 +77,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -724,12 +729,13 @@ fileIterateForeignScan(ForeignScanState *node)
ExprContext *econtext;
MemoryContext oldcontext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- bool found;
+ CopyFromState cstate = festate->cstate;
+ int64 skipped = 0;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
@@ -750,10 +756,40 @@ fileIterateForeignScan(ForeignScanState *node)
* switch in case we are doing that.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+
+ for(;;)
+ {
+ if (!NextCopyFrom(cstate, econtext,
+ slot->tts_values, slot->tts_isnull))
+ break;
+
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and deal with error
+ * information according to ON_ERROR.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ ++skipped);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
ExecStoreVirtualTuple(slot);
+ break;
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -795,8 +831,19 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
- EndCopyFrom(festate->cstate);
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ festate->cstate->num_errors > 0 &&
+ festate->cstate->opts.log_verbosity != COPY_LOG_VERBOSITY_SILENT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
+ EndCopyFrom(festate->cstate);
}
/*
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..a417067e54 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,12 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index f2f2af9a59..bb3579b077 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -126,6 +126,29 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>on_error</literal></term>
+
+ <listitem>
+ <para>
+ Specifies how to behave when encountering an error converting a column's
+ input value into its data type,
+ the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>log_verbosity</literal></term>
+
+ <listitem>
+ <para>
+ Specifies the amount of messages emitted by <literal>file_fdw</literal>,
+ the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
--
2.39.2
On 2024/08/08 16:36, torikoshia wrote:
Attached patches.
0001 adds new option 'silent' to log_verbosity and 0002 adds on_error and log_verbosity options to file_fdw.
Thanks for the patches!
Here are the review comments for 0001 patch.
+ <literal>silent</literal> excludes verbose messages.
This should clarify that in silent mode, not only verbose messages but also
default ones are suppressed?
+ cstate->opts.log_verbosity != COPY_LOG_VERBOSITY_SILENT)
I think using "cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT" instead
might improve readability.
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
- COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+ COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+ COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+ COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
Why do we need to assign specific numbers like -1 or 0 in this enum definition?
Here are the review comments for 0002 patch.
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ ++skipped);
The skipped tuple count isn't accurate because fileIterateForeignScan() resets
"skipped" to 0.
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and deal with error
+ * information according to ON_ERROR.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
If COPY_ON_ERROR_IGNORE indicates tuple skipping, shouldn’t we not only reset
error_occurred but also call "pgstat_progress_update_param" and continue within
this block?
+ for(;;)
+ {
Using "goto" here might improve readability instead of using a "for" loop.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2024-09-11 23:50, Fujii Masao wrote:
Thanks for your comments!
Here are the review comments for 0001 patch.
+ <literal>silent</literal> excludes verbose messages.
This should clarify that in silent mode, not only verbose messages but
also
default ones are suppressed?
Agreed.
+ cstate->opts.log_verbosity !=
COPY_LOG_VERBOSITY_SILENT)I think using "cstate->opts.log_verbosity >=
COPY_LOG_VERBOSITY_DEFAULT" instead
might improve readability.
Agreed.
I've also modified a similar code in fileEndForeignScan() in 0002 patch.
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ - COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ + COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ + COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */Why do we need to assign specific numbers like -1 or 0 in this enum
definition?
CopyFormatOptions is initialized by palloc0() at the beginning of
ProcessCopyOptions().
The reason to assign specific numbers here is to assign
COPY_LOG_VERBOSITY_DEFAULT to 0 as default value and sort elements of
enum according to the amount of logging.
Should we assign 0 to COPY_LOG_VERBOSITY_SILENT and change
opts_out->log_verbosity after the initialization?
Here are the review comments for 0002 patch.
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, + ++skipped);The skipped tuple count isn't accurate because fileIterateForeignScan()
resets
"skipped" to 0.
Ugh. Fixed to use cstate->num_errors.
BTW CopyFrom() also uses local variable skipped. It isn't reset like
file_fdw, but using local variable seems not necessary since we can use
cstate->num_errors here as well.
I'm going to discuss it in another thread.
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and deal with error + * information according to ON_ERROR. + */ + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)If COPY_ON_ERROR_IGNORE indicates tuple skipping, shouldn’t we not only
reset
error_occurred but also call "pgstat_progress_update_param" and
continue within
this block?
I may misunderstand your comment, but I thought it to behave as you
expect in the below codes:
```
+ /* Report that this tuple was skipped by the ON_ERROR clause
*/
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ ++skipped);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
```
+ for(;;) + { Using "goto" here might improve readability instead of using a "for" loop.
Hmm, AFAIU we need to return a slot here before the end of file is
reached.
```
--src/backend/executor/execMain.c [ExecutePlan()]
/*
* if the tuple is null, then we assume there is nothing more
to
* process so we just end the loop...
*/
if (TupIsNull(slot))
break;
```
When ignoring errors, we have to keep calling NextCopyFrom() until we
find a non error tuple or EOF and to do so calling NextCopyFrom() in for
loop seems straight forward.
Replacing the "for" loop using "goto" as follows is possible, but seems
not so readable because of the upward "goto":
```
start_of_getting_tuple:
if (!NextCopyFrom(cstate, econtext,
slot->tts_values, slot->tts_isnull))
goto end_of_fileread;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
cstate->escontext->error_occurred)
{
/*
* Soft error occurred, skip this tuple and deal with error
* information according to ON_ERROR.
*/
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
/*
* Just make ErrorSaveContext ready for the next
NextCopyFrom.
* Since we don't set details_wanted and error_data is not
to
* be filled, just resetting error_occurred is enough.
*/
cstate->escontext->error_occurred = false;
/* Report that this tuple was skipped by the ON_ERROR clause */
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
cstate->num_errors);
goto start_of_fileread;
}
ExecStoreVirtualTuple(slot);
end_of_getting_tuple:
```
Attached v4 patches reflected these comments.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v4-0001-Add-log_verbosity-to-silent.patchtext/x-diff; name=v4-0001-Add-log_verbosity-to-silent.patchDownload
From 8643f8e97305bf2a979fe0b5460917c454d63911 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Thu, 19 Sep 2024 00:16:37 +0900
Subject: [PATCH v4 1/2] Add log_verbosity to 'silent'
When on_error is set to ignore, COPY currently logs a NOTICE message
containing the ignored row count.
When using on_error option to file_fdw, it'd be better to omit this
message in some cases, such as frequent access to a malformed file
via the same foreign table.
As a preliminary step to add on_error option to file_fdw, this patch
adds new value 'silent' to log_verbosity and enables to silence
the message.
---
doc/src/sgml/ref/copy.sgml | 10 +++++++---
src/backend/commands/copy.c | 4 +++-
src/backend/commands/copyfrom.c | 3 ++-
src/include/commands/copy.h | 6 ++++--
src/test/regress/expected/copy2.out | 4 +++-
src/test/regress/sql/copy2.sql | 4 ++++
6 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..d87684a5be 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<literal>verbose</literal>, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
conversion has failed is emitted for each discarded row.
+ When it is set to <literal>silent</literal>, no message is emitted
+ regarding ignored rows.
</para>
</listitem>
</varlistentry>
@@ -428,9 +430,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<listitem>
<para>
Specify the amount of messages emitted by a <command>COPY</command>
- command: <literal>default</literal> or <literal>verbose</literal>. If
- <literal>verbose</literal> is specified, additional messages are emitted
- during processing.
+ command: <literal>default</literal>, <literal>verbose</literal>, or
+ <literal>silent</literal>.
+ If <literal>verbose</literal> is specified, additional messages are
+ emitted during processing.
+ <literal>silent</literal> suppresses both verbose and default messages.
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..03eb7a4eba 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
char *sval;
/*
- * Allow "default", or "verbose" values.
+ * Allow "silent", "default", or "verbose" values.
*/
sval = defGetString(def);
+ if (pg_strcasecmp(sval, "silent") == 0)
+ return COPY_LOG_VERBOSITY_SILENT;
if (pg_strcasecmp(sval, "default") == 0)
return COPY_LOG_VERBOSITY_DEFAULT;
if (pg_strcasecmp(sval, "verbose") == 0)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..47879994f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate)
error_context_stack = errcallback.previous;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0)
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..9fb13e952d 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -45,8 +45,10 @@ typedef enum CopyOnErrorChoice
*/
typedef enum CopyLogVerbosityChoice
{
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
- COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+ COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+ COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is
+ the default, assign 0 */
+ COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 61a19cdc4c..4e752977b5 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
SELECT * FROM check_ign_err;
@@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2;
n | m | k | l
---+-----+---+-------
1 | {1} | 1 | 'foo'
-(1 row)
+ 3 | {3} | 3 | 'bar'
+(2 rows)
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..fa6aa17344 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
+3 {3} 3 'bar'
+4 {4} 4 \N
+\.
-- reset context choice
\set SHOW_CONTEXT errors
base-commit: 95d6e9af07d2e5af2fdd272e72b5b552bad3ea0a
--
2.39.2
v4-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchtext/x-diff; name=v4-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchDownload
From 64bc3c53605d2e71dab816aad31097644b63e6d6 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Thu, 19 Sep 2024 00:18:23 +0900
Subject: [PATCH v4 2/2] Add on_error and log_verbosity options to file_fdw
9e2d870, b725b7e and f5a2278 introduced on_error and log_verbosity to COPY.
Since these options seems useful to file_fdw when accessing dirty data,
this patch adds them to file_fdw.
---
contrib/file_fdw/expected/file_fdw.out | 18 ++++++++
contrib/file_fdw/file_fdw.c | 60 +++++++++++++++++++++++---
contrib/file_fdw/sql/file_fdw.sql | 6 +++
doc/src/sgml/file-fdw.sgml | 23 ++++++++++
4 files changed, 100 insertions(+), 7 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..2ea0deeb10 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,24 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d16821f8e1..82fd9cbe55 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,8 +22,10 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+#include "commands/progress.h"
#include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
@@ -34,6 +36,7 @@
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
#include "utils/acl.h"
+#include "utils/backend_progress.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/sampling.h"
@@ -74,6 +77,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -725,12 +730,12 @@ fileIterateForeignScan(ForeignScanState *node)
ExprContext *econtext;
MemoryContext oldcontext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- bool found;
+ CopyFromState cstate = festate->cstate;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
@@ -751,10 +756,40 @@ fileIterateForeignScan(ForeignScanState *node)
* switch in case we are doing that.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+
+ for(;;)
+ {
+ if (!NextCopyFrom(cstate, econtext,
+ slot->tts_values, slot->tts_isnull))
+ break;
+
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and deal with error
+ * information according to ON_ERROR.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ cstate->num_errors);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
ExecStoreVirtualTuple(slot);
+ break;
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -796,8 +831,19 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
- EndCopyFrom(festate->cstate);
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ festate->cstate->num_errors > 0 &&
+ festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
+ EndCopyFrom(festate->cstate);
}
/*
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..a417067e54 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,12 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index f2f2af9a59..bb3579b077 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -126,6 +126,29 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>on_error</literal></term>
+
+ <listitem>
+ <para>
+ Specifies how to behave when encountering an error converting a column's
+ input value into its data type,
+ the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>log_verbosity</literal></term>
+
+ <listitem>
+ <para>
+ Specifies the amount of messages emitted by <literal>file_fdw</literal>,
+ the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
--
2.39.2
On 2024/09/19 23:16, torikoshia wrote:
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ - COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ + COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ + COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */Why do we need to assign specific numbers like -1 or 0 in this enum definition?
CopyFormatOptions is initialized by palloc0() at the beginning of ProcessCopyOptions().
The reason to assign specific numbers here is to assign COPY_LOG_VERBOSITY_DEFAULT to 0 as default value and sort elements of enum according to the amount of logging.
Understood.
BTW CopyFrom() also uses local variable skipped. It isn't reset like file_fdw, but using local variable seems not necessary since we can use cstate->num_errors here as well.
Sounds reasonable to me.
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and deal with error + * information according to ON_ERROR. + */ + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)If COPY_ON_ERROR_IGNORE indicates tuple skipping, shouldn’t we not only reset
error_occurred but also call "pgstat_progress_update_param" and continue within
this block?I may misunderstand your comment, but I thought it to behave as you expect in the below codes:
The "on_error == COPY_ON_ERROR_IGNORE" condition isn't needed since
"on_error != COPY_ON_ERROR_STOP" is already checked, and on_error accepts
only two values "ignore" and "stop." I assume you added it with
a future option in mind, like "set_to_null" (as discussed in another thread).
However, I’m not sure how much this helps such future changes.
So, how about simplifying the code by replacing "on_error != COPY_ON_ERROR_STOP"
with "on_error == COPY_ON_ERROR_IGNORE" at the top and removing
the "on_error == COPY_ON_ERROR_IGNORE" check in the middle?
It could improve readability.
+ for(;;) + { Using "goto" here might improve readability instead of using a "for" loop.Hmm, AFAIU we need to return a slot here before the end of file is reached.
```
--src/backend/executor/execMain.c [ExecutePlan()]
/*
* if the tuple is null, then we assume there is nothing more to
* process so we just end the loop...
*/
if (TupIsNull(slot))
break;
```When ignoring errors, we have to keep calling NextCopyFrom() until we find a non error tuple or EOF and to do so calling NextCopyFrom() in for loop seems straight forward.
Replacing the "for" loop using "goto" as follows is possible, but seems not so readable because of the upward "goto":
Understood.
Attached v4 patches reflected these comments.
Thanks for updating the patches!
The tab-completion needs to be updated to support the "silent" option?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2024-09-20 11:27, Fujii Masao wrote:
Thanks for your comments!
On 2024/09/19 23:16, torikoshia wrote:
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ - COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ + COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ + COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */Why do we need to assign specific numbers like -1 or 0 in this enum
definition?CopyFormatOptions is initialized by palloc0() at the beginning of
ProcessCopyOptions().
The reason to assign specific numbers here is to assign
COPY_LOG_VERBOSITY_DEFAULT to 0 as default value and sort elements of
enum according to the amount of logging.Understood.
BTW CopyFrom() also uses local variable skipped. It isn't reset like
file_fdw, but using local variable seems not necessary since we can
use cstate->num_errors here as well.Sounds reasonable to me.
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and deal with error + * information according to ON_ERROR. + */ + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)If COPY_ON_ERROR_IGNORE indicates tuple skipping, shouldn’t we not
only reset
error_occurred but also call "pgstat_progress_update_param" and
continue within
this block?I may misunderstand your comment, but I thought it to behave as you
expect in the below codes:The "on_error == COPY_ON_ERROR_IGNORE" condition isn't needed since
"on_error != COPY_ON_ERROR_STOP" is already checked, and on_error
accepts
only two values "ignore" and "stop." I assume you added it with
a future option in mind, like "set_to_null" (as discussed in another
thread).
However, I’m not sure how much this helps such future changes.
So, how about simplifying the code by replacing "on_error !=
COPY_ON_ERROR_STOP"
with "on_error == COPY_ON_ERROR_IGNORE" at the top and removing
the "on_error == COPY_ON_ERROR_IGNORE" check in the middle?
It could improve readability.
Thanks for the explanation and suggestion.
Since there is almost the same code in copyfrom.c, attached 0003 patch
for refactoring both.
+ for(;;) + { Using "goto" here might improve readability instead of using a "for" loop.Hmm, AFAIU we need to return a slot here before the end of file is
reached.```
--src/backend/executor/execMain.c [ExecutePlan()]
/*
* if the tuple is null, then we assume there is nothing
more to
* process so we just end the loop...
*/
if (TupIsNull(slot))
break;
```When ignoring errors, we have to keep calling NextCopyFrom() until we
find a non error tuple or EOF and to do so calling NextCopyFrom() in
for loop seems straight forward.Replacing the "for" loop using "goto" as follows is possible, but
seems not so readable because of the upward "goto":Understood.
Attached v4 patches reflected these comments.
Thanks for updating the patches!
The tab-completion needs to be updated to support the "silent" option?
Yes, updated 0002 patch.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v4-0001-Add-log_verbosity-to-silent.patchtext/x-diff; name=v4-0001-Add-log_verbosity-to-silent.patchDownload
From 89a97461eab2e14ab84778d93f0b4e91999606df Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 24 Sep 2024 14:32:54 +0900
Subject: [PATCH v4 1/3] Add log_verbosity to 'silent'
Currently when on_error is set to ignore, COPY logs a NOTICE message
containing the ignored row count.
When using on_error option to file_fdw, it'd be better to omit this
message in some cases, such as frequent access to a malformed file
via the same foreign table.
As a preliminary step to add on_error option to file_fdw, this patch
adds new value 'silent' to log_verbosity and enables to silence
the message.
---
doc/src/sgml/ref/copy.sgml | 10 +++++++---
src/backend/commands/copy.c | 4 +++-
src/backend/commands/copyfrom.c | 3 ++-
src/include/commands/copy.h | 6 ++++--
src/test/regress/expected/copy2.out | 4 +++-
src/test/regress/sql/copy2.sql | 4 ++++
6 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..d87684a5be 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<literal>verbose</literal>, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
conversion has failed is emitted for each discarded row.
+ When it is set to <literal>silent</literal>, no message is emitted
+ regarding ignored rows.
</para>
</listitem>
</varlistentry>
@@ -428,9 +430,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<listitem>
<para>
Specify the amount of messages emitted by a <command>COPY</command>
- command: <literal>default</literal> or <literal>verbose</literal>. If
- <literal>verbose</literal> is specified, additional messages are emitted
- during processing.
+ command: <literal>default</literal>, <literal>verbose</literal>, or
+ <literal>silent</literal>.
+ If <literal>verbose</literal> is specified, additional messages are
+ emitted during processing.
+ <literal>silent</literal> suppresses both verbose and default messages.
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..03eb7a4eba 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
char *sval;
/*
- * Allow "default", or "verbose" values.
+ * Allow "silent", "default", or "verbose" values.
*/
sval = defGetString(def);
+ if (pg_strcasecmp(sval, "silent") == 0)
+ return COPY_LOG_VERBOSITY_SILENT;
if (pg_strcasecmp(sval, "default") == 0)
return COPY_LOG_VERBOSITY_DEFAULT;
if (pg_strcasecmp(sval, "verbose") == 0)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..47879994f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate)
error_context_stack = errcallback.previous;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0)
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..9fb13e952d 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -45,8 +45,10 @@ typedef enum CopyOnErrorChoice
*/
typedef enum CopyLogVerbosityChoice
{
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
- COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+ COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+ COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is
+ the default, assign 0 */
+ COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 61a19cdc4c..4e752977b5 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
SELECT * FROM check_ign_err;
@@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2;
n | m | k | l
---+-----+---+-------
1 | {1} | 1 | 'foo'
-(1 row)
+ 3 | {3} | 3 | 'bar'
+(2 rows)
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..fa6aa17344 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
+3 {3} 3 'bar'
+4 {4} 4 \N
+\.
-- reset context choice
\set SHOW_CONTEXT errors
base-commit: bbba59e69a56e1622e270f5e47b402c3a904cefc
--
2.39.2
v4-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchtext/x-diff; name=v4-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchDownload
From c3ad3211c6660f74abda30b0607da6cf9f7fb2b0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 24 Sep 2024 14:36:17 +0900
Subject: [PATCH v4 2/3] Add on_error and log_verbosity options to file_fdw
9e2d870, b725b7e and f5a2278 introduced on_error and log_verbosity to COPY.
Since these options seem useful to file_fdw when accessing dirty data,
this patch adds them to file_fdw.
---
contrib/file_fdw/expected/file_fdw.out | 18 ++++++++
contrib/file_fdw/file_fdw.c | 60 +++++++++++++++++++++++---
contrib/file_fdw/sql/file_fdw.sql | 6 +++
doc/src/sgml/file-fdw.sgml | 23 ++++++++++
src/bin/psql/tab-complete.c | 2 +-
5 files changed, 101 insertions(+), 8 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..2ea0deeb10 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,24 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d16821f8e1..82fd9cbe55 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,8 +22,10 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+#include "commands/progress.h"
#include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
@@ -34,6 +36,7 @@
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
#include "utils/acl.h"
+#include "utils/backend_progress.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/sampling.h"
@@ -74,6 +77,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -725,12 +730,12 @@ fileIterateForeignScan(ForeignScanState *node)
ExprContext *econtext;
MemoryContext oldcontext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- bool found;
+ CopyFromState cstate = festate->cstate;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
@@ -751,10 +756,40 @@ fileIterateForeignScan(ForeignScanState *node)
* switch in case we are doing that.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+
+ for(;;)
+ {
+ if (!NextCopyFrom(cstate, econtext,
+ slot->tts_values, slot->tts_isnull))
+ break;
+
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and deal with error
+ * information according to ON_ERROR.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ cstate->num_errors);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
ExecStoreVirtualTuple(slot);
+ break;
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -796,8 +831,19 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
- EndCopyFrom(festate->cstate);
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ festate->cstate->num_errors > 0 &&
+ festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
+ EndCopyFrom(festate->cstate);
}
/*
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..a417067e54 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,12 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index f2f2af9a59..bb3579b077 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -126,6 +126,29 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>on_error</literal></term>
+
+ <listitem>
+ <para>
+ Specifies how to behave when encountering an error converting a column's
+ input value into its data type,
+ the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>log_verbosity</literal></term>
+
+ <listitem>
+ <para>
+ Specifies the amount of messages emitted by <literal>file_fdw</literal>,
+ the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a7ccde6d7d..6530b0f1ce 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2916,7 +2916,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
- COMPLETE_WITH("default", "verbose");
+ COMPLETE_WITH("silent", "default", "verbose");
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
--
2.39.2
v4-0003-Refactor-CopyOnErrorChoice-option.patchtext/x-diff; name=v4-0003-Refactor-CopyOnErrorChoice-option.patchDownload
From 096e22c705b165cc1c8f6850fee1e6460a2b3a9d Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 24 Sep 2024 14:40:56 +0900
Subject: [PATCH v4 3/3] Refactor CopyOnErrorChoice option handling
Currently we assume that the possible values of CopyOnErrorChoice will
increase in the futue, but it decreases the readability.
This patch simplifies the logic using the condition that CopyOnErrorChoice
has only 2 options now.
---
contrib/file_fdw/file_fdw.c | 18 ++++++------------
src/backend/commands/copyfrom.c | 17 ++++++-----------
2 files changed, 12 insertions(+), 23 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 82fd9cbe55..2cf1f2cc85 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -763,27 +763,21 @@ fileIterateForeignScan(ForeignScanState *node)
slot->tts_values, slot->tts_isnull))
break;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->escontext->error_occurred)
{
/*
- * Soft error occurred, skip this tuple and deal with error
- * information according to ON_ERROR.
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we don't
+ * set details_wanted and error_data is not to be filled, just
+ * resetting error_occurred is enough.
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
-
- /*
- * Just make ErrorSaveContext ready for the next NextCopyFrom.
- * Since we don't set details_wanted and error_data is not to
- * be filled, just resetting error_occurred is enough.
- */
- cstate->escontext->error_occurred = false;
+ cstate->escontext->error_occurred = false;
/* Report that this tuple was skipped by the ON_ERROR clause */
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
cstate->num_errors);
- /* Repeat NextCopyFrom() until no soft error occurs */
continue;
}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 47879994f7..6b5192cbed 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1004,21 +1004,16 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->escontext->error_occurred)
{
/*
- * Soft error occurred, skip this tuple and deal with error
- * information according to ON_ERROR.
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we don't
+ * set details_wanted and error_data is not to be filled, just
+ * resetting error_occurred is enough.
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
-
- /*
- * Just make ErrorSaveContext ready for the next NextCopyFrom.
- * Since we don't set details_wanted and error_data is not to
- * be filled, just resetting error_occurred is enough.
- */
- cstate->escontext->error_occurred = false;
+ cstate->escontext->error_occurred = false;
/* Report that this tuple was skipped by the ON_ERROR clause */
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
--
2.39.2
On 2024/09/24 20:08, torikoshia wrote:
Thanks for the explanation and suggestion.
Since there is almost the same code in copyfrom.c, attached 0003 patch for refactoring both.
Thanks for updating the patches!
Regarding 0002.patch, I think it’s better to include the refactored code
from the start rather than adding redundant code intentionally.
How about leaving just the refactor in copyfrom.c to 0003.patch?
If that works, as a refactoring, you could also replace "skipped" with
"cstate->num_errors" in that patch, as you suggested earlier.
While reviewing again, I noticed that running ANALYZE on a file_fdw
foreign table also calls NextCopyFrom(), but it doesn’t seem to
skip erroneous rows when on_error is set to "ignore." This could lead
to inaccurate statistics. Shouldn’t ANALYZE on file_fdw foreign tables
with on_error=ignore also skip erroneous rows?
The tab-completion needs to be updated to support the "silent" option?
Yes, updated 0002 patch.
Thanks! Also, this should be part of 0001.patch since "silent" is
introduced there, right?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2024-09-25 00:46, Fujii Masao wrote:
Thanks for the comments!
On 2024/09/24 20:08, torikoshia wrote:
Thanks for the explanation and suggestion.
Since there is almost the same code in copyfrom.c, attached 0003 patch
for refactoring both.Thanks for updating the patches!
Regarding 0002.patch, I think it’s better to include the refactored
code
from the start rather than adding redundant code intentionally.
How about leaving just the refactor in copyfrom.c to 0003.patch?
If that works, as a refactoring, you could also replace "skipped" with
"cstate->num_errors" in that patch, as you suggested earlier.
Agreed.
While reviewing again, I noticed that running ANALYZE on a file_fdw
foreign table also calls NextCopyFrom(), but it doesn’t seem to
skip erroneous rows when on_error is set to "ignore." This could lead
to inaccurate statistics. Shouldn’t ANALYZE on file_fdw foreign tables
with on_error=ignore also skip erroneous rows?
Thanks, it seems the right thing to do.
The tab-completion needs to be updated to support the "silent"
option?Yes, updated 0002 patch.
Thanks! Also, this should be part of 0001.patch since "silent" is
introduced there, right?
Agreed.
Updated the patches.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v5-0001-Add-log_verbosity-to-silent.patchtext/x-diff; name=v5-0001-Add-log_verbosity-to-silent.patchDownload
From 494447fcf99abab141c27e35171dc6d63f64c994 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:28:15 +0900
Subject: [PATCH v5 1/3] Add log_verbosity to 'silent'
Currently when on_error is set to ignore, COPY logs a NOTICE message
containing the ignored row count.
When using on_error option to file_fdw, it'd be better to omit this
message in some cases, such as frequent access to a malformed file
via the same foreign table.
As a preliminary step to add on_error option to file_fdw, this patch
adds new value 'silent' to log_verbosity and enables to silence
the message.
---
doc/src/sgml/ref/copy.sgml | 10 +++++++---
src/backend/commands/copy.c | 4 +++-
src/backend/commands/copyfrom.c | 3 ++-
src/bin/psql/tab-complete.c | 2 +-
src/include/commands/copy.h | 6 ++++--
src/test/regress/expected/copy2.out | 4 +++-
src/test/regress/sql/copy2.sql | 4 ++++
7 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..d87684a5be 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<literal>verbose</literal>, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
conversion has failed is emitted for each discarded row.
+ When it is set to <literal>silent</literal>, no message is emitted
+ regarding ignored rows.
</para>
</listitem>
</varlistentry>
@@ -428,9 +430,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<listitem>
<para>
Specify the amount of messages emitted by a <command>COPY</command>
- command: <literal>default</literal> or <literal>verbose</literal>. If
- <literal>verbose</literal> is specified, additional messages are emitted
- during processing.
+ command: <literal>default</literal>, <literal>verbose</literal>, or
+ <literal>silent</literal>.
+ If <literal>verbose</literal> is specified, additional messages are
+ emitted during processing.
+ <literal>silent</literal> suppresses both verbose and default messages.
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..03eb7a4eba 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
char *sval;
/*
- * Allow "default", or "verbose" values.
+ * Allow "silent", "default", or "verbose" values.
*/
sval = defGetString(def);
+ if (pg_strcasecmp(sval, "silent") == 0)
+ return COPY_LOG_VERBOSITY_SILENT;
if (pg_strcasecmp(sval, "default") == 0)
return COPY_LOG_VERBOSITY_DEFAULT;
if (pg_strcasecmp(sval, "verbose") == 0)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..47879994f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate)
error_context_stack = errcallback.previous;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0)
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a7ccde6d7d..6530b0f1ce 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2916,7 +2916,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
- COMPLETE_WITH("default", "verbose");
+ COMPLETE_WITH("silent", "default", "verbose");
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..9fb13e952d 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -45,8 +45,10 @@ typedef enum CopyOnErrorChoice
*/
typedef enum CopyLogVerbosityChoice
{
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
- COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+ COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+ COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is
+ the default, assign 0 */
+ COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 61a19cdc4c..4e752977b5 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
SELECT * FROM check_ign_err;
@@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2;
n | m | k | l
---+-----+---+-------
1 | {1} | 1 | 'foo'
-(1 row)
+ 3 | {3} | 3 | 'bar'
+(2 rows)
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..fa6aa17344 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
+3 {3} 3 'bar'
+4 {4} 4 \N
+\.
-- reset context choice
\set SHOW_CONTEXT errors
base-commit: 1ab67c9dfaadda86059f405e5746efb6ddb9fe21
--
2.39.2
v5-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchtext/x-diff; name=v5-0002-Add-on_error-and-log_verbosity-options-to-file_fd.patchDownload
From dd8f20716524982e3687ea3e6a44a11e5f0b886c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:29:48 +0900
Subject: [PATCH v5 2/3] Add on_error and log_verbosity options to file_fdw
9e2d870, b725b7e and f5a2278 introduced on_error and log_verbosity to COPY.
Since these options seem useful to file_fdw when accessing dirty data,
this patch adds them to file_fdw.
---
contrib/file_fdw/expected/file_fdw.out | 18 +++++++
contrib/file_fdw/file_fdw.c | 74 +++++++++++++++++++++++---
contrib/file_fdw/sql/file_fdw.sql | 6 +++
doc/src/sgml/file-fdw.sgml | 23 ++++++++
4 files changed, 115 insertions(+), 6 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..2ea0deeb10 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,24 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d16821f8e1..543b1ca4f9 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,8 +22,10 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+#include "commands/progress.h"
#include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
@@ -34,6 +36,7 @@
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
#include "utils/acl.h"
+#include "utils/backend_progress.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/sampling.h"
@@ -74,6 +77,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -726,11 +731,12 @@ fileIterateForeignScan(ForeignScanState *node)
MemoryContext oldcontext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
bool found;
+ CopyFromState cstate = festate->cstate;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
@@ -751,10 +757,36 @@ fileIterateForeignScan(ForeignScanState *node)
* switch in case we are doing that.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+
+ for(;;)
+ {
+ found = NextCopyFrom(cstate, econtext,
+ slot->tts_values, slot->tts_isnull);
+ if (!found)
+ break;
+
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we don't
+ * set details_wanted and error_data is not to be filled, just
+ * resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ cstate->num_errors);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
ExecStoreVirtualTuple(slot);
+ break;
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -796,8 +828,19 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
- EndCopyFrom(festate->cstate);
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ festate->cstate->num_errors > 0 &&
+ festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
+ EndCopyFrom(festate->cstate);
}
/*
@@ -1191,6 +1234,25 @@ file_acquire_sample_rows(Relation onerel, int elevel,
if (!found)
break;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we don't
+ * set details_wanted and error_data is not to be filled, just
+ * resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ cstate->num_errors);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
/*
* The first targrows sample rows are simply copied into the
* reservoir. Then we start replacing tuples in the sample until we
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..a417067e54 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,12 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index f2f2af9a59..bb3579b077 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -126,6 +126,29 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>on_error</literal></term>
+
+ <listitem>
+ <para>
+ Specifies how to behave when encountering an error converting a column's
+ input value into its data type,
+ the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>log_verbosity</literal></term>
+
+ <listitem>
+ <para>
+ Specifies the amount of messages emitted by <literal>file_fdw</literal>,
+ the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
--
2.39.2
v5-0003-Refactor-copyfrom.c.patchtext/x-diff; name=v5-0003-Refactor-copyfrom.c.patchDownload
From 4cc913a0965ddc07413b256ab979c50cbbfa8b1a Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:30:26 +0900
Subject: [PATCH v5 3/3] Refactor copyfrom.c regarding ON_ERROR
Refactor copyfrom.c For the consistency with 0001 and 0002 patch:
- currently local variable 'skipped' is defined, but remove it because
we've already counted it as cstate->num_errors
- simplify the logic using the condtion that CopyOnErrorChoice only accepts
two values
- add the same comment
---
src/backend/commands/copyfrom.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 47879994f7..1cd004ffbc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -657,7 +657,6 @@ CopyFrom(CopyFromState cstate)
CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
int64 processed = 0;
int64 excluded = 0;
- int64 skipped = 0;
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
@@ -1004,26 +1003,22 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->escontext->error_occurred)
{
/*
- * Soft error occurred, skip this tuple and deal with error
- * information according to ON_ERROR.
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we don't
+ * set details_wanted and error_data is not to be filled, just
+ * resetting error_occurred is enough.
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
-
- /*
- * Just make ErrorSaveContext ready for the next NextCopyFrom.
- * Since we don't set details_wanted and error_data is not to
- * be filled, just resetting error_occurred is enough.
- */
- cstate->escontext->error_occurred = false;
+ cstate->escontext->error_occurred = false;
/* Report that this tuple was skipped by the ON_ERROR clause */
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
- ++skipped);
+ cstate->num_errors);
+ /* Repeat NextCopyFrom() until no soft error occurs */
continue;
}
--
2.39.2
On 2024/09/26 21:57, torikoshia wrote:
Updated the patches.
Thanks for updating the patches! I’ve made some changes based on your work, which are attached.
Barring any objections, I'm thinking to push these patches.
For patches 0001 and 0003, I ran pgindent and updated the commit message.
Regarding patch 0002:
- I updated the regression test to run ANALYZE on the file_fdw foreign table
since the on_error option also affects the ANALYZE command. To ensure test
stability, the test now runs ANALYZE with log_verbosity = 'silent'.
- I removed the code that updated the count of skipped rows for
the pg_stat_progress_copy view. As far as I know, file_fdw doesn’t
currently support tracking pg_stat_progress_copy.tuples_processed.
Supporting only tuples_skipped seems inconsistent, so I suggest creating
a separate patch to extend file_fdw to track both tuples_processed and
tuples_skipped in this view.
- I refactored the for-loop handling on_error = 'ignore' in fileIterateForeignScan()
by replacing it with a goto statement for improved readability.
- I modified file_fdw to log a NOTICE message about skipped rows at the end of
ANALYZE if any rows are skipped due to the on_error = 'ignore' setting.
Regarding the "file contains XXX rows" message reported by the ANALYZE VERBOSE
command on the file_fdw foreign table, what number should be reflected in XXX,
especially when some rows are skipped due to on_error = 'ignore'?
Currently, the count only includes valid rows, excluding any skipped rows.
I haven't modified this code yet. Should we instead count all rows
(valid and erroneous) and report that total?
I noticed the code for reporting the number of skipped rows due to
on_error = 'ignore' appears in three places. I’m considering creating
a common function for this reporting to eliminate redundancy but haven’t
implemented it yet.
- I’ve updated the commit message and run pgindent.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
v6-0001-Add-log_verbosity-silent-support-to-COPY-command.patchtext/plain; charset=UTF-8; name=v6-0001-Add-log_verbosity-silent-support-to-COPY-command.patchDownload
From 6bcf56dc0556b3e9ded7200229c05c69e9c4fd6a Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:28:15 +0900
Subject: [PATCH v6 1/3] Add log_verbosity = 'silent' support to COPY command.
Previously, when the on_error option was set to ignore, the COPY command
would always log NOTICE messages for input rows discarded due to
data type incompatibility. Users had no way to suppress these messages.
This commit introduces a new log_verbosity setting, 'silent',
which prevents the COPY command from emitting NOTICE messages
when on_error = 'ignore' is used, even if rows are discarded.
This feature is particularly useful when processing malformed files
frequently, where a flood of NOTICE messages can be undesirable.
For example, when frequently loading malformed files via the COPY command
or querying foreign tables using file_fdw (with an upcoming patch to
add on_error support for file_fdw), users may prefer to suppress
these messages to reduce log noise and improve clarity.
Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com
---
doc/src/sgml/ref/copy.sgml | 10 +++++++---
src/backend/commands/copy.c | 4 +++-
src/backend/commands/copyfrom.c | 3 ++-
src/bin/psql/tab-complete.c | 2 +-
src/include/commands/copy.h | 4 +++-
src/test/regress/expected/copy2.out | 4 +++-
src/test/regress/sql/copy2.sql | 4 ++++
7 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..d87684a5be 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<literal>verbose</literal>, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
conversion has failed is emitted for each discarded row.
+ When it is set to <literal>silent</literal>, no message is emitted
+ regarding ignored rows.
</para>
</listitem>
</varlistentry>
@@ -428,9 +430,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<listitem>
<para>
Specify the amount of messages emitted by a <command>COPY</command>
- command: <literal>default</literal> or <literal>verbose</literal>. If
- <literal>verbose</literal> is specified, additional messages are emitted
- during processing.
+ command: <literal>default</literal>, <literal>verbose</literal>, or
+ <literal>silent</literal>.
+ If <literal>verbose</literal> is specified, additional messages are
+ emitted during processing.
+ <literal>silent</literal> suppresses both verbose and default messages.
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..03eb7a4eba 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
char *sval;
/*
- * Allow "default", or "verbose" values.
+ * Allow "silent", "default", or "verbose" values.
*/
sval = defGetString(def);
+ if (pg_strcasecmp(sval, "silent") == 0)
+ return COPY_LOG_VERBOSITY_SILENT;
if (pg_strcasecmp(sval, "default") == 0)
return COPY_LOG_VERBOSITY_DEFAULT;
if (pg_strcasecmp(sval, "verbose") == 0)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..47879994f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate)
error_context_stack = errcallback.previous;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0)
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a7ccde6d7d..6530b0f1ce 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2916,7 +2916,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
- COMPLETE_WITH("default", "verbose");
+ COMPLETE_WITH("silent", "default", "verbose");
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..6f64d97fdd 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -45,7 +45,9 @@ typedef enum CopyOnErrorChoice
*/
typedef enum CopyLogVerbosityChoice
{
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+ COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+ COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is
+ * the default, assign 0 */
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 61a19cdc4c..4e752977b5 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
SELECT * FROM check_ign_err;
@@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2;
n | m | k | l
---+-----+---+-------
1 | {1} | 1 | 'foo'
-(1 row)
+ 3 | {3} | 3 | 'bar'
+(2 rows)
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..fa6aa17344 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
+3 {3} 3 'bar'
+4 {4} 4 \N
+\.
-- reset context choice
\set SHOW_CONTEXT errors
--
2.45.2
v6-0002-file_fdw-Add-on_error-and-log_verbosity-options-t.patchtext/plain; charset=UTF-8; name=v6-0002-file_fdw-Add-on_error-and-log_verbosity-options-t.patchDownload
From 1fb0d7a1d4af8961edb56e8d05400a118e9a584c Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Mon, 30 Sep 2024 23:05:26 +0900
Subject: [PATCH v6 2/3] file_fdw: Add on_error and log_verbosity options to
file_fdw.
In v17, the on_error and log_verbosity options were introduced for
the COPY command. This commit extends support for these options
to file_fdw.
Setting on_error = 'ignore' for a file_fdw foreign table allows users
to query it without errors, even when the input file contains
malformed rows, by skipping the problematic rows.
Both on_error and log_verbosity options apply to SELECT and ANALYZE
operations on file_fdw foreign tables.
Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com
---
contrib/file_fdw/expected/file_fdw.out | 19 +++++++
contrib/file_fdw/file_fdw.c | 72 +++++++++++++++++++++++---
contrib/file_fdw/sql/file_fdw.sql | 7 +++
doc/src/sgml/file-fdw.sgml | 23 ++++++++
4 files changed, 113 insertions(+), 8 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..593fdc782e 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,25 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ANALYZE agg_bad;
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d16821f8e1..1e28c20797 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,6 +22,7 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
#include "commands/vacuum.h"
@@ -74,6 +75,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -725,12 +728,12 @@ fileIterateForeignScan(ForeignScanState *node)
ExprContext *econtext;
MemoryContext oldcontext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- bool found;
+ CopyFromState cstate = festate->cstate;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
@@ -751,10 +754,27 @@ fileIterateForeignScan(ForeignScanState *node)
* switch in case we are doing that.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+
+retry:
+ if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull))
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we
+ * don't set details_wanted and error_data is not to be filled,
+ * just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ goto retry;
+ }
+
ExecStoreVirtualTuple(slot);
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -796,8 +816,19 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
- EndCopyFrom(festate->cstate);
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ festate->cstate->num_errors > 0 &&
+ festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
+ EndCopyFrom(festate->cstate);
}
/*
@@ -1113,7 +1144,8 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
* which must have at least targrows entries.
* The actual number of rows selected is returned as the function result.
* We also count the total number of rows in the file and return it into
- * *totalrows. Note that *totaldeadrows is always set to 0.
+ * *totalrows. Rows skipped due to on_error = 'ignore' are not included
+ * in this count. Note that *totaldeadrows is always set to 0.
*
* Note that the returned list of rows is not always in order by physical
* position in the file. Therefore, correlation estimates derived later
@@ -1191,6 +1223,21 @@ file_acquire_sample_rows(Relation onerel, int elevel,
if (!found)
break;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we
+ * don't set details_wanted and error_data is not to be filled,
+ * just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
/*
* The first targrows sample rows are simply copied into the
* reservoir. Then we start replacing tuples in the sample until we
@@ -1236,6 +1283,15 @@ file_acquire_sample_rows(Relation onerel, int elevel,
/* Clean up. */
MemoryContextDelete(tupcontext);
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+
EndCopyFrom(cstate);
pfree(values);
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..edd77c5cd2 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,13 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ANALYZE agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index f2f2af9a59..bb3579b077 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -126,6 +126,29 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>on_error</literal></term>
+
+ <listitem>
+ <para>
+ Specifies how to behave when encountering an error converting a column's
+ input value into its data type,
+ the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>log_verbosity</literal></term>
+
+ <listitem>
+ <para>
+ Specifies the amount of messages emitted by <literal>file_fdw</literal>,
+ the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
--
2.45.2
v6-0003-Refactor-CopyFrom-in-copyfrom.c.patchtext/plain; charset=UTF-8; name=v6-0003-Refactor-CopyFrom-in-copyfrom.c.patchDownload
From 68663c230bbfc54e8bd730258e3a1a420eb0a92e Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:30:26 +0900
Subject: [PATCH v6 3/3] Refactor CopyFrom() in copyfrom.c.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This commit simplifies CopyFrom() by removing the unnecessary local variable
'skipped', which tracked the number of rows skipped due to on_error = 'ignore'.
That count is already handled by cstate->num_errors, so the 'skipped' variable
was redundant.
Additionally, the condition on_error != COPY_ON_ERROR_STOP is removed.
Since on_error == COPY_ON_ERROR_IGNORE is already checked, and on_error
only has two values (ignore and stop), the additional check was redundant
and made the logic harder to read. Seemingly this was introduced
in preparation for a future patch, but the current checks don’t offer
clear value and have been removed to improve readability.
Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com
---
src/backend/commands/copyfrom.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 47879994f7..9139a40785 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -657,7 +657,6 @@ CopyFrom(CopyFromState cstate)
CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
int64 processed = 0;
int64 excluded = 0;
- int64 skipped = 0;
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
@@ -1004,26 +1003,22 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->escontext->error_occurred)
{
/*
- * Soft error occurred, skip this tuple and deal with error
- * information according to ON_ERROR.
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we
+ * don't set details_wanted and error_data is not to be filled,
+ * just resetting error_occurred is enough.
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
-
- /*
- * Just make ErrorSaveContext ready for the next NextCopyFrom.
- * Since we don't set details_wanted and error_data is not to
- * be filled, just resetting error_occurred is enough.
- */
- cstate->escontext->error_occurred = false;
+ cstate->escontext->error_occurred = false;
/* Report that this tuple was skipped by the ON_ERROR clause */
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
- ++skipped);
+ cstate->num_errors);
+ /* Repeat NextCopyFrom() until no soft error occurs */
continue;
}
--
2.45.2
Hi,
On Mon, Sep 30, 2024 at 8:36 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2024/09/26 21:57, torikoshia wrote:
Updated the patches.
Thanks for updating the patches! I’ve made some changes based on your work, which are attached.
Barring any objections, I'm thinking to push these patches.For patches 0001 and 0003, I ran pgindent and updated the commit message.
Regarding patch 0002:
- I updated the regression test to run ANALYZE on the file_fdw foreign table
since the on_error option also affects the ANALYZE command. To ensure test
stability, the test now runs ANALYZE with log_verbosity = 'silent'.- I removed the code that updated the count of skipped rows for
the pg_stat_progress_copy view. As far as I know, file_fdw doesn’t
currently support tracking pg_stat_progress_copy.tuples_processed.
Supporting only tuples_skipped seems inconsistent, so I suggest creating
a separate patch to extend file_fdw to track both tuples_processed and
tuples_skipped in this view.- I refactored the for-loop handling on_error = 'ignore' in fileIterateForeignScan()
by replacing it with a goto statement for improved readability.- I modified file_fdw to log a NOTICE message about skipped rows at the end of
ANALYZE if any rows are skipped due to the on_error = 'ignore' setting.Regarding the "file contains XXX rows" message reported by the ANALYZE VERBOSE
command on the file_fdw foreign table, what number should be reflected in XXX,
especially when some rows are skipped due to on_error = 'ignore'?
Currently, the count only includes valid rows, excluding any skipped rows.
I haven't modified this code yet. Should we instead count all rows
(valid and erroneous) and report that total?I noticed the code for reporting the number of skipped rows due to
on_error = 'ignore' appears in three places. I’m considering creating
a common function for this reporting to eliminate redundancy but haven’t
implemented it yet.- I’ve updated the commit message and run pgindent.
Sorry for being late in joining the review of this patch. Both 0001
and 0003 look good to me. I have two comments on the 0002 patch:
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values,
slot->tts_isnull);
- if (found)
+
+retry:
+ if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull))
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next
NextCopyFrom. Since we
+ * don't set details_wanted and error_data is
not to be filled,
+ * just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ goto retry;
+ }
+
ExecStoreVirtualTuple(slot);
+ }
I think that while scanning a file_fdw foreign table with
log_verbosity='silent' the query is not interruptible.
Also, we don't switch to the per-tuple memory context when retrying
due to a soft error. I'm not sure it's okay as in CopyFrom(), a
similar function for COPY command, we switch to the per-tuple memory
context every time before parsing an input line. Would it be
problematic if we switch to another memory context while parsing an
input line? In CopyFrom() we also call ResetPerTupleExprContext() and
ExecClearTuple() for every input, so we might want to consider calling
them for every input.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 2024/10/02 9:27, Masahiko Sawada wrote:
Sorry for being late in joining the review of this patch. Both 0001
and 0003 look good to me. I have two comments on the 0002 patch:
Thanks for the review!
I think that while scanning a file_fdw foreign table with
log_verbosity='silent' the query is not interruptible.
You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
Also, we don't switch to the per-tuple memory context when retrying
due to a soft error. I'm not sure it's okay as in CopyFrom(), a
similar function for COPY command, we switch to the per-tuple memory
context every time before parsing an input line. Would it be
problematic if we switch to another memory context while parsing an
input line? In CopyFrom() we also call ResetPerTupleExprContext() and
ExecClearTuple() for every input, so we might want to consider calling
them for every input.
Yes, I've updated the patch based on your comment.
Could you please review the latest version?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
v7-0001-Add-log_verbosity-silent-support-to-COPY-command.patchtext/plain; charset=UTF-8; name=v7-0001-Add-log_verbosity-silent-support-to-COPY-command.patchDownload
From 2c455d62aad84267e987b07a1287fd979abc0995 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:28:15 +0900
Subject: [PATCH v7 1/3] Add log_verbosity = 'silent' support to COPY command.
Previously, when the on_error option was set to ignore, the COPY command
would always log NOTICE messages for input rows discarded due to
data type incompatibility. Users had no way to suppress these messages.
This commit introduces a new log_verbosity setting, 'silent',
which prevents the COPY command from emitting NOTICE messages
when on_error = 'ignore' is used, even if rows are discarded.
This feature is particularly useful when processing malformed files
frequently, where a flood of NOTICE messages can be undesirable.
For example, when frequently loading malformed files via the COPY command
or querying foreign tables using file_fdw (with an upcoming patch to
add on_error support for file_fdw), users may prefer to suppress
these messages to reduce log noise and improve clarity.
Author: Atsushi Torikoshi
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com
---
doc/src/sgml/ref/copy.sgml | 10 +++++++---
src/backend/commands/copy.c | 4 +++-
src/backend/commands/copyfrom.c | 3 ++-
src/bin/psql/tab-complete.c | 2 +-
src/include/commands/copy.h | 4 +++-
src/test/regress/expected/copy2.out | 4 +++-
src/test/regress/sql/copy2.sql | 4 ++++
7 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fdbd20bc50..58a14bc427 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<literal>verbose</literal>, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
conversion has failed is emitted for each discarded row.
+ When it is set to <literal>silent</literal>, no message is emitted
+ regarding ignored rows.
</para>
</listitem>
</varlistentry>
@@ -428,9 +430,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<listitem>
<para>
Specify the amount of messages emitted by a <command>COPY</command>
- command: <literal>default</literal> or <literal>verbose</literal>. If
- <literal>verbose</literal> is specified, additional messages are emitted
- during processing.
+ command: <literal>default</literal>, <literal>verbose</literal>, or
+ <literal>silent</literal>.
+ If <literal>verbose</literal> is specified, additional messages are
+ emitted during processing.
+ <literal>silent</literal> suppresses both verbose and default messages.
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..03eb7a4eba 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
char *sval;
/*
- * Allow "default", or "verbose" values.
+ * Allow "silent", "default", or "verbose" values.
*/
sval = defGetString(def);
+ if (pg_strcasecmp(sval, "silent") == 0)
+ return COPY_LOG_VERBOSITY_SILENT;
if (pg_strcasecmp(sval, "default") == 0)
return COPY_LOG_VERBOSITY_DEFAULT;
if (pg_strcasecmp(sval, "verbose") == 0)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..47879994f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate)
error_context_stack = errcallback.previous;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0)
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
"%llu rows were skipped due to data type incompatibility",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a7ccde6d7d..6530b0f1ce 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2916,7 +2916,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
- COMPLETE_WITH("default", "verbose");
+ COMPLETE_WITH("silent", "default", "verbose");
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..6f64d97fdd 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -45,7 +45,9 @@ typedef enum CopyOnErrorChoice
*/
typedef enum CopyLogVerbosityChoice
{
- COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+ COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+ COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is
+ * the default, assign 0 */
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 61a19cdc4c..4e752977b5 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
SELECT * FROM check_ign_err;
@@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2;
n | m | k | l
---+-----+---+-------
1 | {1} | 1 | 'foo'
-(1 row)
+ 3 | {3} | 3 | 'bar'
+(2 rows)
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..fa6aa17344 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
+3 {3} 3 'bar'
+4 {4} 4 \N
+\.
-- reset context choice
\set SHOW_CONTEXT errors
--
2.45.2
v7-0002-file_fdw-Add-on_error-and-log_verbosity-options-t.patchtext/plain; charset=UTF-8; name=v7-0002-file_fdw-Add-on_error-and-log_verbosity-options-t.patchDownload
From f4458a36698997da23f813f858e6680c3e1daefa Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Mon, 30 Sep 2024 23:05:26 +0900
Subject: [PATCH v7 2/3] file_fdw: Add on_error and log_verbosity options to
file_fdw.
In v17, the on_error and log_verbosity options were introduced for
the COPY command. This commit extends support for these options
to file_fdw.
Setting on_error = 'ignore' for a file_fdw foreign table allows users
to query it without errors, even when the input file contains
malformed rows, by skipping the problematic rows.
Both on_error and log_verbosity options apply to SELECT and ANALYZE
operations on file_fdw foreign tables.
Author: Atsushi Torikoshi
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com
---
contrib/file_fdw/expected/file_fdw.out | 19 +++++
contrib/file_fdw/file_fdw.c | 107 +++++++++++++++++++++----
contrib/file_fdw/sql/file_fdw.sql | 7 ++
doc/src/sgml/file-fdw.sgml | 23 ++++++
4 files changed, 140 insertions(+), 16 deletions(-)
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..593fdc782e 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,25 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
SELECT * FROM agg_bad; -- ERROR
ERROR: invalid input syntax for type real: "aaa"
CONTEXT: COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE: 1 row was skipped due to data type incompatibility
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ a | b
+-----+--------
+ 100 | 99.097
+ 42 | 324.78
+(2 rows)
+
+ANALYZE agg_bad;
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d16821f8e1..043204c3e7 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,6 +22,7 @@
#include "catalog/pg_authid.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
#include "commands/defrem.h"
#include "commands/explain.h"
#include "commands/vacuum.h"
@@ -74,6 +75,8 @@ static const struct FileFdwOption valid_options[] = {
{"null", ForeignTableRelationId},
{"default", ForeignTableRelationId},
{"encoding", ForeignTableRelationId},
+ {"on_error", ForeignTableRelationId},
+ {"log_verbosity", ForeignTableRelationId},
{"force_not_null", AttributeRelationId},
{"force_null", AttributeRelationId},
@@ -723,38 +726,74 @@ fileIterateForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
EState *estate = CreateExecutorState();
ExprContext *econtext;
- MemoryContext oldcontext;
+ MemoryContext oldcontext = CurrentMemoryContext;
TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- bool found;
+ CopyFromState cstate = festate->cstate;
ErrorContextCallback errcallback;
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
- errcallback.arg = (void *) festate->cstate;
+ errcallback.arg = (void *) cstate;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
/*
- * The protocol for loading a virtual tuple into a slot is first
- * ExecClearTuple, then fill the values/isnull arrays, then
- * ExecStoreVirtualTuple. If we don't find another row in the file, we
- * just skip the last step, leaving the slot empty as required.
- *
* We pass ExprContext because there might be a use of the DEFAULT option
* in COPY FROM, so we may need to evaluate default expressions.
*/
- ExecClearTuple(slot);
econtext = GetPerTupleExprContext(estate);
+retry:
+
/*
* DEFAULT expressions need to be evaluated in a per-tuple context, so
* switch in case we are doing that.
*/
- oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
- if (found)
+ MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+ /*
+ * The protocol for loading a virtual tuple into a slot is first
+ * ExecClearTuple, then fill the values/isnull arrays, then
+ * ExecStoreVirtualTuple. If we don't find another row in the file, we
+ * just skip the last step, leaving the slot empty as required.
+ *
+ */
+ ExecClearTuple(slot);
+
+ if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull))
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we
+ * don't set details_wanted and error_data is not to be filled,
+ * just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Switch back to original memory context */
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Make sure we are interruptible while repeatedly calling
+ * NextCopyFrom() until no soft error occurs.
+ */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Reset the per-tuple exprcontext, to clean-up after expression
+ * evaluations etc.
+ */
+ ResetPerTupleExprContext(estate);
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ goto retry;
+ }
+
ExecStoreVirtualTuple(slot);
+ }
/* Switch back to original memory context */
MemoryContextSwitchTo(oldcontext);
@@ -796,8 +835,19 @@ fileEndForeignScan(ForeignScanState *node)
FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state;
/* if festate is NULL, we are in EXPLAIN; nothing to do */
- if (festate)
- EndCopyFrom(festate->cstate);
+ if (!festate)
+ return;
+
+ if (festate->cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ festate->cstate->num_errors > 0 &&
+ festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) festate->cstate->num_errors,
+ (unsigned long long) festate->cstate->num_errors));
+
+ EndCopyFrom(festate->cstate);
}
/*
@@ -1113,7 +1163,8 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
* which must have at least targrows entries.
* The actual number of rows selected is returned as the function result.
* We also count the total number of rows in the file and return it into
- * *totalrows. Note that *totaldeadrows is always set to 0.
+ * *totalrows. Rows skipped due to on_error = 'ignore' are not included
+ * in this count. Note that *totaldeadrows is always set to 0.
*
* Note that the returned list of rows is not always in order by physical
* position in the file. Therefore, correlation estimates derived later
@@ -1191,6 +1242,21 @@ file_acquire_sample_rows(Relation onerel, int elevel,
if (!found)
break;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we
+ * don't set details_wanted and error_data is not to be filled,
+ * just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ continue;
+ }
+
/*
* The first targrows sample rows are simply copied into the
* reservoir. Then we start replacing tuples in the sample until we
@@ -1236,6 +1302,15 @@ file_acquire_sample_rows(Relation onerel, int elevel,
/* Clean up. */
MemoryContextDelete(tupcontext);
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+
EndCopyFrom(cstate);
pfree(values);
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..edd77c5cd2 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,13 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
-- error context report tests
SELECT * FROM agg_bad; -- ERROR
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ANALYZE agg_bad;
+
-- misc query tests
\t on
SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index f2f2af9a59..bb3579b077 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -126,6 +126,29 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>on_error</literal></term>
+
+ <listitem>
+ <para>
+ Specifies how to behave when encountering an error converting a column's
+ input value into its data type,
+ the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>log_verbosity</literal></term>
+
+ <listitem>
+ <para>
+ Specifies the amount of messages emitted by <literal>file_fdw</literal>,
+ the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
--
2.45.2
v7-0003-Refactor-CopyFrom-in-copyfrom.c.patchtext/plain; charset=UTF-8; name=v7-0003-Refactor-CopyFrom-in-copyfrom.c.patchDownload
From 6c132dc601d979ed8ffa4fadc0af9a8c73f88fce Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:30:26 +0900
Subject: [PATCH v7 3/3] Refactor CopyFrom() in copyfrom.c.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This commit simplifies CopyFrom() by removing the unnecessary local variable
'skipped', which tracked the number of rows skipped due to on_error = 'ignore'.
That count is already handled by cstate->num_errors, so the 'skipped' variable
was redundant.
Additionally, the condition on_error != COPY_ON_ERROR_STOP is removed.
Since on_error == COPY_ON_ERROR_IGNORE is already checked, and on_error
only has two values (ignore and stop), the additional check was redundant
and made the logic harder to read. Seemingly this was introduced
in preparation for a future patch, but the current checks don’t offer
clear value and have been removed to improve readability.
Author: Atsushi Torikoshi
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com
---
src/backend/commands/copyfrom.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 47879994f7..9139a40785 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -657,7 +657,6 @@ CopyFrom(CopyFromState cstate)
CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
int64 processed = 0;
int64 excluded = 0;
- int64 skipped = 0;
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
@@ -1004,26 +1003,22 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->escontext->error_occurred)
{
/*
- * Soft error occurred, skip this tuple and deal with error
- * information according to ON_ERROR.
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next NextCopyFrom. Since we
+ * don't set details_wanted and error_data is not to be filled,
+ * just resetting error_occurred is enough.
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
-
- /*
- * Just make ErrorSaveContext ready for the next NextCopyFrom.
- * Since we don't set details_wanted and error_data is not to
- * be filled, just resetting error_occurred is enough.
- */
- cstate->escontext->error_occurred = false;
+ cstate->escontext->error_occurred = false;
/* Report that this tuple was skipped by the ON_ERROR clause */
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
- ++skipped);
+ cstate->num_errors);
+ /* Repeat NextCopyFrom() until no soft error occurs */
continue;
}
--
2.45.2
On Tue, Oct 1, 2024 at 11:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2024/10/02 9:27, Masahiko Sawada wrote:
Sorry for being late in joining the review of this patch. Both 0001
and 0003 look good to me. I have two comments on the 0002 patch:Thanks for the review!
I think that while scanning a file_fdw foreign table with
log_verbosity='silent' the query is not interruptible.You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
Also, we don't switch to the per-tuple memory context when retrying
due to a soft error. I'm not sure it's okay as in CopyFrom(), a
similar function for COPY command, we switch to the per-tuple memory
context every time before parsing an input line. Would it be
problematic if we switch to another memory context while parsing an
input line? In CopyFrom() we also call ResetPerTupleExprContext() and
ExecClearTuple() for every input, so we might want to consider calling
them for every input.Yes, I've updated the patch based on your comment.
Could you please review the latest version?
Thank you for updating the patch! All patches look good to me.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 2024/10/03 13:23, Masahiko Sawada wrote:
On Tue, Oct 1, 2024 at 11:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2024/10/02 9:27, Masahiko Sawada wrote:
Sorry for being late in joining the review of this patch. Both 0001
and 0003 look good to me. I have two comments on the 0002 patch:Thanks for the review!
I think that while scanning a file_fdw foreign table with
log_verbosity='silent' the query is not interruptible.You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
Also, we don't switch to the per-tuple memory context when retrying
due to a soft error. I'm not sure it's okay as in CopyFrom(), a
similar function for COPY command, we switch to the per-tuple memory
context every time before parsing an input line. Would it be
problematic if we switch to another memory context while parsing an
input line? In CopyFrom() we also call ResetPerTupleExprContext() and
ExecClearTuple() for every input, so we might want to consider calling
them for every input.Yes, I've updated the patch based on your comment.
Could you please review the latest version?Thank you for updating the patch! All patches look good to me.
Thanks for the review! Pushed.
BTW, regarding the issue I mentioned earlier about file_fdw not reporting
the number of tuples processed and skipped in the pg_stat_progress_copy view,
I'll start a new thread to discuss this further.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2024-10-03 18:03, Fujii Masao wrote:
On 2024/10/03 13:23, Masahiko Sawada wrote:
On Tue, Oct 1, 2024 at 11:34 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:On 2024/10/02 9:27, Masahiko Sawada wrote:
Sorry for being late in joining the review of this patch. Both 0001
and 0003 look good to me. I have two comments on the 0002 patch:Thanks for the review!
I think that while scanning a file_fdw foreign table with
log_verbosity='silent' the query is not interruptible.You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
Also, we don't switch to the per-tuple memory context when retrying
due to a soft error. I'm not sure it's okay as in CopyFrom(), a
similar function for COPY command, we switch to the per-tuple memory
context every time before parsing an input line. Would it be
problematic if we switch to another memory context while parsing an
input line? In CopyFrom() we also call ResetPerTupleExprContext()
and
ExecClearTuple() for every input, so we might want to consider
calling
them for every input.Yes, I've updated the patch based on your comment.
Could you please review the latest version?Thank you for updating the patch! All patches look good to me.
Thanks for the review! Pushed.
Fujii-san, Sawada-san,
Thanks for your review and modifications!
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation