Add reject_limit option to file_fdw

Started by torikoshiaabout 1 year ago16 messages
#1torikoshia
torikoshia@oss.nttdata.com
1 attachment(s)

Hi,

4ac2a9bec introduced reject_limit option to the COPY command, and I was
wondering if it might be beneficial to add the same option to file_fdw.

Although there may be fewer practical use cases compared to COPY, it
could still be useful in situations where the file being read via
file_fdw is subject to modifications and there is a need to tolerate a
limited number of errors.

What do you think?

I've attached a patch.
Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the
value for the foreign table's option must be single-quoted. I’m not
entirely sure if this is the correct approach, but in order to
accommodate this, the patch modifies the value of reject_limit option to
accept not only numeric values but also strings.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachments:

v1-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchtext/x-diff; name=v1-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchDownload
From 8d969465257ebc511a0d1b1520f29095103ae4af Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Thu, 17 Oct 2024 22:19:06 +0900
Subject: [PATCH v1] file_fdw: Add reject_limit option to file_fdw

---
 contrib/file_fdw/expected/file_fdw.out | 10 +++++++++-
 contrib/file_fdw/file_fdw.c            |  8 ++++++++
 contrib/file_fdw/sql/file_fdw.sql      |  4 +++-
 doc/src/sgml/file-fdw.sgml             | 12 ++++++++++++
 src/backend/commands/copy.c            | 12 +++++++++++-
 5 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 593fdc782e..1accc36d68 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,7 +206,7 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 SELECT * FROM agg_bad;               -- ERROR
 ERROR:  invalid input syntax for type real: "aaa"
 CONTEXT:  COPY agg_bad, line 3, column b: "aaa"
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 NOTICE:  1 row was skipped due to data type incompatibility
@@ -224,6 +224,14 @@ SELECT * FROM agg_bad;
   42 | 324.78
 (2 rows)
 
+ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
+SELECT * FROM agg_bad;
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
 ANALYZE agg_bad;
 -- misc query tests
 \t on
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 043204c3e7..2ae83ba09b 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -77,6 +77,7 @@ static const struct FileFdwOption valid_options[] = {
 	{"encoding", ForeignTableRelationId},
 	{"on_error", ForeignTableRelationId},
 	{"log_verbosity", ForeignTableRelationId},
+	{"reject_limit", ForeignTableRelationId},
 	{"force_not_null", AttributeRelationId},
 	{"force_null", AttributeRelationId},
 
@@ -788,6 +789,13 @@ retry:
 			 */
 			ResetPerTupleExprContext(estate);
 
+			if (cstate->opts.reject_limit > 0 && \
+				cstate->num_errors > cstate->opts.reject_limit)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+								(long long) cstate->opts.reject_limit)));
+
 			/* Repeat NextCopyFrom() until no soft error occurs */
 			goto retry;
 		}
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index edd77c5cd2..ee66d7d831 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,11 +150,13 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 -- error context report tests
 SELECT * FROM agg_bad;               -- ERROR
 
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
 SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
+SELECT * FROM agg_bad;
 ANALYZE agg_bad;
 
 -- misc query tests
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index bb3579b077..b05085fd7d 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -149,6 +149,18 @@
    </listitem>
   </varlistentry>
 
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+
  </variablelist>
 
  <para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..c6c8bc10b1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -424,7 +424,17 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 static int64
 defGetCopyRejectLimitOption(DefElem *def)
 {
-	int64		reject_limit = defGetInt64(def);
+	int64		reject_limit;
+
+	if (def->arg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a numeric value",
+						def->defname)));
+	else if (nodeTag(def->arg) == T_String)
+		reject_limit = pg_strtoint64(strVal(def->arg));
+	else
+		reject_limit = defGetInt64(def);
 
 	if (reject_limit <= 0)
 		ereport(ERROR,
-- 
2.39.2

#2Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: torikoshia (#1)
Re: Add reject_limit option to file_fdw

On 2024/10/17 22:45, torikoshia wrote:

Hi,

4ac2a9bec introduced reject_limit option to the COPY command, and I was wondering if it might be beneficial to add the same option to file_fdw.

Although there may be fewer practical use cases compared to COPY, it could still be useful in situations where the file being read via file_fdw is subject to modifications and there is a need to tolerate a limited number of errors.

Agreed.

What do you think?

I've attached a patch.

Thanks for the patch! Could you add it to the next CommitFest?

+ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
+SELECT * FROM agg_bad;
+  a  |   b
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)

Wouldn't it be better to include a test where a SELECT query fails, even with
on_error set to "ignore," because the number of errors exceeds reject_limit?

+ if (cstate->opts.reject_limit > 0 && \

The trailing backslash isn't needed here.

+  <varlistentry>
+   <term><literal>reject_limit</literal></term>

This entry should be placed right after the on_error option,
following the same order as in the COPY command documentation.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the value for the foreign table's option must be single-quoted. I’m not entirely sure if this is the correct approach, but in order to accommodate this, the patch modifies the value of reject_limit option to accept not only numeric values but also strings.

I don't have a better approach for this, so I'm okay with your solution.
Just one note: it would be helpful to explain and comment
why defGetCopyRejectLimitOption() accepts and parses both int64 and
string values.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#3torikoshia
torikoshia@oss.nttdata.com
In reply to: Fujii Masao (#2)
1 attachment(s)
Re: Add reject_limit option to file_fdw

On 2024-10-18 01:51, Fujii Masao wrote:
Thanks for your review and sorry for my late reply.

On 2024/10/17 22:45, torikoshia wrote:

Hi,

4ac2a9bec introduced reject_limit option to the COPY command, and I
was wondering if it might be beneficial to add the same option to
file_fdw.

Although there may be fewer practical use cases compared to COPY, it
could still be useful in situations where the file being read via
file_fdw is subject to modifications and there is a need to tolerate a
limited number of errors.

Agreed.

What do you think?

I've attached a patch.

Thanks for the patch! Could you add it to the next CommitFest?

Added an entry for this patch:
https://commitfest.postgresql.org/50/5331/

+ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
+SELECT * FROM agg_bad;
+  a  |   b
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)

Wouldn't it be better to include a test where a SELECT query fails,
even with
on_error set to "ignore," because the number of errors exceeds
reject_limit?

Agreed.

+ if (cstate->opts.reject_limit > 0 && \

The trailing backslash isn't needed here.

Agreed.

+  <varlistentry>
+   <term><literal>reject_limit</literal></term>

This entry should be placed right after the on_error option,
following the same order as in the COPY command documentation.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the
value for the foreign table's option must be single-quoted. I’m not
entirely sure if this is the correct approach, but in order to
accommodate this, the patch modifies the value of reject_limit option
to accept not only numeric values but also strings.

I don't have a better approach for this, so I'm okay with your
solution.
Just one note: it would be helpful to explain and comment
why defGetCopyRejectLimitOption() accepts and parses both int64 and
string values.

Added a comment for this.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachments:

v2-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchtext/x-diff; name=v2-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchDownload
From 02b81f376d9b316a06d68c7bf714c6729100162c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 5 Nov 2024 21:57:16 +0900
Subject: [PATCH v2] file_fdw: Add reject_limit option to file_fdw

Commit 4ac2a9bece introduced REJECT_LIMIT option for COPY. This patch
extends support for this option to file_fdw.

As well as REJECT_LIMIT option for COPY, this option limits the
maximum number of erroneous rows that can be skipped.
If more rows encounter data type conversion errors than allowed by
reject_limit, the access to the file_fdw foreign table will fail
with an error, even when on_error = 'ignore'.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the
value for the foreign table's option must be single-quoted.
To accommodate this, the patch modifies defGetCopyRejectLimitOption
to accept not only int64 but also strings.

---
 contrib/file_fdw/data/agg.bad2         |  5 +++++
 contrib/file_fdw/expected/file_fdw.out | 15 ++++++++++++++-
 contrib/file_fdw/file_fdw.c            |  8 ++++++++
 contrib/file_fdw/sql/file_fdw.sql      |  7 ++++++-
 doc/src/sgml/file-fdw.sgml             | 12 ++++++++++++
 src/backend/commands/copy.c            | 15 ++++++++++++++-
 6 files changed, 59 insertions(+), 3 deletions(-)
 create mode 100644 contrib/file_fdw/data/agg.bad2

diff --git a/contrib/file_fdw/data/agg.bad2 b/contrib/file_fdw/data/agg.bad2
new file mode 100644
index 0000000000..04279ce55b
--- /dev/null
+++ b/contrib/file_fdw/data/agg.bad2
@@ -0,0 +1,5 @@
+56;@7.8@
+100;@99.097@
+0;@aaa@
+42;@324.78@
+1;@bbb@
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 593fdc782e..dbfae52baf 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,7 +206,7 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 SELECT * FROM agg_bad;               -- ERROR
 ERROR:  invalid input syntax for type real: "aaa"
 CONTEXT:  COPY agg_bad, line 3, column b: "aaa"
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 NOTICE:  1 row was skipped due to data type incompatibility
@@ -224,6 +224,19 @@ SELECT * FROM agg_bad;
   42 | 324.78
 (2 rows)
 
+\set filename :abs_srcdir '/data/agg.bad2'
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD reject_limit '1');
+SELECT * FROM agg_bad;
+ERROR:  skipped more than REJECT_LIMIT (1) rows due to data type incompatibility
+CONTEXT:  COPY agg_bad, line 5, column b: "bbb"
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
 ANALYZE agg_bad;
 -- misc query tests
 \t on
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 043204c3e7..9e2896f32a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -77,6 +77,7 @@ static const struct FileFdwOption valid_options[] = {
 	{"encoding", ForeignTableRelationId},
 	{"on_error", ForeignTableRelationId},
 	{"log_verbosity", ForeignTableRelationId},
+	{"reject_limit", ForeignTableRelationId},
 	{"force_not_null", AttributeRelationId},
 	{"force_null", AttributeRelationId},
 
@@ -788,6 +789,13 @@ retry:
 			 */
 			ResetPerTupleExprContext(estate);
 
+			if (cstate->opts.reject_limit > 0 &&
+				cstate->num_errors > cstate->opts.reject_limit)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+								(long long) cstate->opts.reject_limit)));
+
 			/* Repeat NextCopyFrom() until no soft error occurs */
 			goto retry;
 		}
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index edd77c5cd2..0208e095ae 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,11 +150,16 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 -- error context report tests
 SELECT * FROM agg_bad;               -- ERROR
 
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
 SELECT * FROM agg_bad;
+\set filename :abs_srcdir '/data/agg.bad2'
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD reject_limit '1');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
 ANALYZE agg_bad;
 
 -- misc query tests
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index bb3579b077..882d9a76d2 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -138,6 +138,18 @@
    </listitem>
   </varlistentry>
 
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+
   <varlistentry>
    <term><literal>log_verbosity</literal></term>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..c88f58742d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -420,11 +420,24 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 
 /*
  * Extract REJECT_LIMIT value from a DefElem.
+ *
+ * Since foreign table options must be single-quoted values, this function
+ * accepts both int64 and string value.
  */
 static int64
 defGetCopyRejectLimitOption(DefElem *def)
 {
-	int64		reject_limit = defGetInt64(def);
+	int64		reject_limit;
+
+	if (def->arg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a numeric value",
+						def->defname)));
+	else if (nodeTag(def->arg) == T_String)
+		reject_limit = pg_strtoint64(strVal(def->arg));
+	else
+		reject_limit = defGetInt64(def);
 
 	if (reject_limit <= 0)
 		ereport(ERROR,
-- 
2.39.2

#4Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: torikoshia (#3)
Re: Add reject_limit option to file_fdw

On 2024/11/05 22:30, torikoshia wrote:

Thanks for the patch! Could you add it to the next CommitFest?

Added an entry for this patch:
https://commitfest.postgresql.org/50/5331/

Thanks!

+ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
+SELECT * FROM agg_bad;
+  a  |   b
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)

Wouldn't it be better to include a test where a SELECT query fails, even with
on_error set to "ignore," because the number of errors exceeds reject_limit?

Agreed.

As for the agg.bad2 file that your latest patch added:

Instead of adding a new bad input file, what do you think about simply appending
"1;@bbb@" to the existing agg.bad file and adjusting sql/file_fdw.sql as follows?
This approach might keep things simpler.

-------------------
-\set filename :abs_srcdir '/data/agg.bad2'
-ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD reject_limit '1');
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1');
  SELECT * FROM agg_bad;
  ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
-------------------
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>

This entry should be placed right after the on_error option,
following the same order as in the COPY command documentation.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the value for the foreign table's option must be single-quoted. I’m not entirely sure if this is the correct approach, but in order to accommodate this, the patch modifies the value of reject_limit option to accept not only numeric values but also strings.

I don't have a better approach for this, so I'm okay with your solution.
Just one note: it would be helpful to explain and comment
why defGetCopyRejectLimitOption() accepts and parses both int64 and
string values.

Added a comment for this.

Thanks for adding the comment. It clearly states that REJECT_LIMIT can be
a single-quoted string. However, it might also be helpful to mention that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
option or as a single-quoted string for the foreign table option using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#5torikoshia
torikoshia@oss.nttdata.com
In reply to: Fujii Masao (#4)
Re: Add reject_limit option to file_fdw

On 2024-11-08 01:44, Fujii Masao wrote:
Thanks for your review!

On 2024/11/05 22:30, torikoshia wrote:

Thanks for the patch! Could you add it to the next CommitFest?

Added an entry for this patch:
https://commitfest.postgresql.org/50/5331/

Thanks!

+ALTER FOREIGN TABLE agg_bad OPTIONS (reject_limit '1');
+SELECT * FROM agg_bad;
+  a  |   b
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)

Wouldn't it be better to include a test where a SELECT query fails,
even with
on_error set to "ignore," because the number of errors exceeds
reject_limit?

Agreed.

As for the agg.bad2 file that your latest patch added:

Instead of adding a new bad input file, what do you think about simply
appending
"1;@bbb@" to the existing agg.bad file and adjusting sql/file_fdw.sql
as follows?
This approach might keep things simpler.

That seems better. Agreed.

-------------------
-\set filename :abs_srcdir '/data/agg.bad2'
-ALTER FOREIGN TABLE agg_bad OPTIONS (SET filename :'filename', ADD
reject_limit '1');
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1');
SELECT * FROM agg_bad;
ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
-------------------
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>

This entry should be placed right after the on_error option,
following the same order as in the COPY command documentation.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands,
the value for the foreign table's option must be single-quoted. I’m
not entirely sure if this is the correct approach, but in order to
accommodate this, the patch modifies the value of reject_limit
option to accept not only numeric values but also strings.

I don't have a better approach for this, so I'm okay with your
solution.
Just one note: it would be helpful to explain and comment
why defGetCopyRejectLimitOption() accepts and parses both int64 and
string values.

Added a comment for this.

Thanks for adding the comment. It clearly states that REJECT_LIMIT can
be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

#6Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: torikoshia (#5)
Re: Add reject_limit option to file_fdw

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT can be
a single-quoted string. However, it might also be helpful to mention that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
option or as a single-quoted string for the foreign table option using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it, though.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#7torikoshia
torikoshia@oss.nttdata.com
In reply to: Fujii Masao (#6)
1 attachment(s)
Re: Add reject_limit option to file_fdw

On 2024-11-12 01:49, Fujii Masao wrote:

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT
can be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option
using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it,
though.

Oops, thanks for pointing it out.
Here it is.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachments:

v3-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchtext/x-diff; name=v3-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchDownload
From 6380a1a52e125671cfecb3a9056019ff85a32dd7 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Mon, 11 Nov 2024 21:36:57 +0900
Subject: [PATCH v3] file_fdw: Add reject_limit option to file_fdw

Commit 4ac2a9bece introduced REJECT_LIMIT option for COPY. This patch
extends support for this option to file_fdw.

As well as REJECT_LIMIT option for COPY, this option limits the
maximum number of erroneous rows that can be skipped.
If more rows encounter data type conversion errors than allowed by
reject_limit, the access to the file_fdw foreign table will fail
with an error, even when on_error = 'ignore'.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the
value for the foreign table's option must be single-quoted.
To accommodate this, the patch modifies defGetCopyRejectLimitOption
to accept not only int64 but also strings.
---
 contrib/file_fdw/data/agg.bad          |  1 +
 contrib/file_fdw/expected/file_fdw.out | 16 ++++++++++++++--
 contrib/file_fdw/file_fdw.c            |  8 ++++++++
 contrib/file_fdw/sql/file_fdw.sql      |  6 +++++-
 doc/src/sgml/file-fdw.sgml             | 12 ++++++++++++
 src/backend/commands/copy.c            | 16 +++++++++++++++-
 6 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/contrib/file_fdw/data/agg.bad b/contrib/file_fdw/data/agg.bad
index 3415b15007..04279ce55b 100644
--- a/contrib/file_fdw/data/agg.bad
+++ b/contrib/file_fdw/data/agg.bad
@@ -2,3 +2,4 @@
 100;@99.097@
 0;@aaa@
 42;@324.78@
+1;@bbb@
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 593fdc782e..f52e82bfcf 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,10 +206,10 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 SELECT * FROM agg_bad;               -- ERROR
 ERROR:  invalid input syntax for type real: "aaa"
 CONTEXT:  COPY agg_bad, line 3, column b: "aaa"
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
-NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  2 rows were skipped due to data type incompatibility
   a  |   b    
 -----+--------
  100 | 99.097
@@ -224,6 +224,18 @@ SELECT * FROM agg_bad;
   42 | 324.78
 (2 rows)
 
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ERROR:  skipped more than REJECT_LIMIT (1) rows due to data type incompatibility
+CONTEXT:  COPY agg_bad, line 5, column b: "bbb"
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
 ANALYZE agg_bad;
 -- misc query tests
 \t on
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 043204c3e7..9e2896f32a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -77,6 +77,7 @@ static const struct FileFdwOption valid_options[] = {
 	{"encoding", ForeignTableRelationId},
 	{"on_error", ForeignTableRelationId},
 	{"log_verbosity", ForeignTableRelationId},
+	{"reject_limit", ForeignTableRelationId},
 	{"force_not_null", AttributeRelationId},
 	{"force_null", AttributeRelationId},
 
@@ -788,6 +789,13 @@ retry:
 			 */
 			ResetPerTupleExprContext(estate);
 
+			if (cstate->opts.reject_limit > 0 &&
+				cstate->num_errors > cstate->opts.reject_limit)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+								(long long) cstate->opts.reject_limit)));
+
 			/* Repeat NextCopyFrom() until no soft error occurs */
 			goto retry;
 		}
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index edd77c5cd2..0d6920e1f6 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,11 +150,15 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 -- error context report tests
 SELECT * FROM agg_bad;               -- ERROR
 
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
 SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
 ANALYZE agg_bad;
 
 -- misc query tests
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index bb3579b077..882d9a76d2 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -138,6 +138,18 @@
    </listitem>
   </varlistentry>
 
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+
   <varlistentry>
    <term><literal>log_verbosity</literal></term>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..2d98ecf3f4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -420,11 +420,25 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 
 /*
  * Extract REJECT_LIMIT value from a DefElem.
+ *
+ * REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
+ * option or as a single-quoted string for the foreign table option using
+ * file_fdw. Therefore this function needs to handle both formats.
  */
 static int64
 defGetCopyRejectLimitOption(DefElem *def)
 {
-	int64		reject_limit = defGetInt64(def);
+	int64		reject_limit;
+
+	if (def->arg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a numeric value",
+						def->defname)));
+	else if (nodeTag(def->arg) == T_String)
+		reject_limit = pg_strtoint64(strVal(def->arg));
+	else
+		reject_limit = defGetInt64(def);
 
 	if (reject_limit <= 0)
 		ereport(ERROR,
-- 
2.39.2

#8Yugo Nagata
nagata@sraoss.co.jp
In reply to: torikoshia (#7)
Re: Add reject_limit option to file_fdw

On Tue, 12 Nov 2024 10:16:50 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-11-12 01:49, Fujii Masao wrote:

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT
can be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option
using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it,
though.

Oops, thanks for pointing it out.
Here it is.

I have one minor comment.

+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+								(long long) cstate->opts.reject_limit)));

Do we have to keep this message consistent with ones of COPY command?
With foreign data wrappers, it seems common that the option name is passed in
lowercase, so is it better to use "skipped more than reject_limit ..." rather
than using uppercase?

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

#9Kirill Reshke
reshkekirill@gmail.com
In reply to: torikoshia (#7)
Re: Add reject_limit option to file_fdw

On Tue, 12 Nov 2024 at 06:17, torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-11-12 01:49, Fujii Masao wrote:

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT
can be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option
using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it,
though.

Oops, thanks for pointing it out.
Here it is.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Hi!

A little question from me.

This is your doc for reject_limit:

+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while
converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+

This is how it looks on the current HEAD for copy.

<varlistentry>
<term><literal>REJECT_LIMIT</literal></term>
<listitem>
<para>
Specifies the maximum number of errors tolerated while converting a
column's input value to its data type, when <literal>ON_ERROR</literal> is
set to <literal>ignore</literal>.
If the input causes more errors than the specified value, the
<command>COPY</command>
command fails, even with <literal>ON_ERROR</literal> set to
<literal>ignore</literal>.
This clause must be used with
<literal>ON_ERROR</literal>=<literal>ignore</literal>
and <replaceable class="parameter">maxerror</replaceable> must
be positive <type>bigint</type>.
If not specified, <literal>ON_ERROR</literal>=<literal>ignore</literal>
allows an unlimited number of errors, meaning <command>COPY</command> will
skip all erroneous data.
</para>
</listitem>
</varlistentry>

There is a difference. Should we add REJECT_LIMIT vs ON_ERROR
clarification for file_fdw too? or maybe we put a reference for COPY
doc.

--
Best regards,
Kirill Reshke

#10torikoshia
torikoshia@oss.nttdata.com
In reply to: Kirill Reshke (#9)
Re: Add reject_limit option to file_fdw

On 2024-11-12 15:23, Kirill Reshke wrote:

Thanks for your review!

On Tue, 12 Nov 2024 at 06:17, torikoshia <torikoshia@oss.nttdata.com>
wrote:

On 2024-11-12 01:49, Fujii Masao wrote:

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT
can be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option
using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it,
though.

Oops, thanks for pointing it out.
Here it is.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Hi!

A little question from me.

This is your doc for reject_limit:

+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while
converting a column's
+     input value to its data type, the same as 
<command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+

This is how it looks on the current HEAD for copy.

<varlistentry>
<term><literal>REJECT_LIMIT</literal></term>
<listitem>
<para>
Specifies the maximum number of errors tolerated while converting
a
column's input value to its data type, when
<literal>ON_ERROR</literal> is
set to <literal>ignore</literal>.
If the input causes more errors than the specified value, the
<command>COPY</command>
command fails, even with <literal>ON_ERROR</literal> set to
<literal>ignore</literal>.
This clause must be used with
<literal>ON_ERROR</literal>=<literal>ignore</literal>
and <replaceable class="parameter">maxerror</replaceable> must
be positive <type>bigint</type>.
If not specified,
<literal>ON_ERROR</literal>=<literal>ignore</literal>
allows an unlimited number of errors, meaning
<command>COPY</command> will
skip all erroneous data.
</para>
</listitem>
</varlistentry>

There is a difference. Should we add REJECT_LIMIT vs ON_ERROR
clarification for file_fdw too? or maybe we put a reference for COPY
doc.

As you may know, some options for file_fdw are the same as those for
COPY. While the manual provides detailed descriptions of these options
in the COPY section, the explanations in file_fdw are shorter and less
detailed.

I intended to follow this approach, but do you think it should be
changed?

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

#11torikoshia
torikoshia@oss.nttdata.com
In reply to: Yugo Nagata (#8)
1 attachment(s)
Re: Add reject_limit option to file_fdw

On 2024-11-12 14:51, Yugo Nagata wrote:

Thanks for your review!

On Tue, 12 Nov 2024 10:16:50 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-11-12 01:49, Fujii Masao wrote:

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT
can be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option
using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it,
though.

Oops, thanks for pointing it out.
Here it is.

I have one minor comment.

+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data
type incompatibility",
+								(long long) cstate->opts.reject_limit)));

Do we have to keep this message consistent with ones of COPY command?
With foreign data wrappers, it seems common that the option name is
passed in
lowercase, so is it better to use "skipped more than reject_limit ..."
rather
than using uppercase?

Agreed.
Attached v4 patch.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachments:

v4-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchtext/x-diff; name=v4-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchDownload
From 114d8dd2c57bf8b9cf712b8fa401c7b8eaa8a958 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 13 Nov 2024 21:32:25 +0900
Subject: [PATCH v4] file_fdw: Add reject_limit option to file_fdw

Commit 4ac2a9bece introduced REJECT_LIMIT option for COPY. This patch
extends support for this option to file_fdw.

As well as REJECT_LIMIT option for COPY, this option limits the
maximum number of erroneous rows that can be skipped.
If more rows encounter data type conversion errors than allowed by
reject_limit, the access to the file_fdw foreign table will fail
with an error, even when on_error = 'ignore'.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the
value for the foreign table's option must be single-quoted.
To accommodate this, the patch modifies defGetCopyRejectLimitOption
to accept not only int64 but also strings.
---
 contrib/file_fdw/data/agg.bad          |  1 +
 contrib/file_fdw/expected/file_fdw.out | 16 ++++++++++++++--
 contrib/file_fdw/file_fdw.c            |  8 ++++++++
 contrib/file_fdw/sql/file_fdw.sql      |  6 +++++-
 doc/src/sgml/file-fdw.sgml             | 12 ++++++++++++
 src/backend/commands/copy.c            | 16 +++++++++++++++-
 6 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/contrib/file_fdw/data/agg.bad b/contrib/file_fdw/data/agg.bad
index 3415b15007..04279ce55b 100644
--- a/contrib/file_fdw/data/agg.bad
+++ b/contrib/file_fdw/data/agg.bad
@@ -2,3 +2,4 @@
 100;@99.097@
 0;@aaa@
 42;@324.78@
+1;@bbb@
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 593fdc782e..81efe63eef 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,10 +206,10 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 SELECT * FROM agg_bad;               -- ERROR
 ERROR:  invalid input syntax for type real: "aaa"
 CONTEXT:  COPY agg_bad, line 3, column b: "aaa"
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
-NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  2 rows were skipped due to data type incompatibility
   a  |   b    
 -----+--------
  100 | 99.097
@@ -224,6 +224,18 @@ SELECT * FROM agg_bad;
   42 | 324.78
 (2 rows)
 
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ERROR:  skipped more than reject_limit (1) rows due to data type incompatibility
+CONTEXT:  COPY agg_bad, line 5, column b: "bbb"
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
 ANALYZE agg_bad;
 -- misc query tests
 \t on
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 043204c3e7..6c64f81098 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -77,6 +77,7 @@ static const struct FileFdwOption valid_options[] = {
 	{"encoding", ForeignTableRelationId},
 	{"on_error", ForeignTableRelationId},
 	{"log_verbosity", ForeignTableRelationId},
+	{"reject_limit", ForeignTableRelationId},
 	{"force_not_null", AttributeRelationId},
 	{"force_null", AttributeRelationId},
 
@@ -788,6 +789,13 @@ retry:
 			 */
 			ResetPerTupleExprContext(estate);
 
+			if (cstate->opts.reject_limit > 0 &&
+				cstate->num_errors > cstate->opts.reject_limit)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than reject_limit (%lld) rows due to data type incompatibility",
+								(long long) cstate->opts.reject_limit)));
+
 			/* Repeat NextCopyFrom() until no soft error occurs */
 			goto retry;
 		}
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index edd77c5cd2..0d6920e1f6 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,11 +150,15 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 -- error context report tests
 SELECT * FROM agg_bad;               -- ERROR
 
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
 SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
 ANALYZE agg_bad;
 
 -- misc query tests
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index bb3579b077..882d9a76d2 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -138,6 +138,18 @@
    </listitem>
   </varlistentry>
 
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+
   <varlistentry>
    <term><literal>log_verbosity</literal></term>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..2d98ecf3f4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -420,11 +420,25 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 
 /*
  * Extract REJECT_LIMIT value from a DefElem.
+ *
+ * REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
+ * option or as a single-quoted string for the foreign table option using
+ * file_fdw. Therefore this function needs to handle both formats.
  */
 static int64
 defGetCopyRejectLimitOption(DefElem *def)
 {
-	int64		reject_limit = defGetInt64(def);
+	int64		reject_limit;
+
+	if (def->arg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a numeric value",
+						def->defname)));
+	else if (nodeTag(def->arg) == T_String)
+		reject_limit = pg_strtoint64(strVal(def->arg));
+	else
+		reject_limit = defGetInt64(def);
 
 	if (reject_limit <= 0)
 		ereport(ERROR,
-- 
2.39.2

#12Yugo NAGATA
nagata@sraoss.co.jp
In reply to: torikoshia (#11)
Re: Add reject_limit option to file_fdw

On Wed, 13 Nov 2024 21:48:10 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-11-12 14:51, Yugo Nagata wrote:

Thanks for your review!

On Tue, 12 Nov 2024 10:16:50 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-11-12 01:49, Fujii Masao wrote:

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT
can be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option
using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it,
though.

Oops, thanks for pointing it out.
Here it is.

I have one minor comment.

+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data
type incompatibility",
+								(long long) cstate->opts.reject_limit)));

Do we have to keep this message consistent with ones of COPY command?
With foreign data wrappers, it seems common that the option name is
passed in
lowercase, so is it better to use "skipped more than reject_limit ..."
rather
than using uppercase?

Agreed.
Attached v4 patch.

Thank you for updating the patch.

However, I noticed that file_fdw could output the following error using uppercase
when an invalid value is set to the reject_limit option,

ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '-1');
ERROR: REJECT_LIMIT (-1) must be greater than zero

as well as if reject_limit is set when on_error != 'ignore'.

ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE

Also, in the regression tests, I found similar behaviours on existing options,
for example;

CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'xml'); -- ERROR
ERROR: COPY format "xml" not recognized
CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', quote ':'); -- ERROR
ERROR: COPY QUOTE requires CSV mode
CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'text', escape ':'); -- ERROR
...

These messages may be unexpected for some users because the documentation of
fild_fdw does not explicitly describe that file_fdw uses COPY internally.
(I can find several wordings like "as COPY", though.)
However, since the current file_fdw already has such messages, I came to think
making the messages on "reject_limit" consistent with COPY is reasonable.
I mean, the previous version of the message seems fine.

As another comment, do we need to add validator test for on_error and
reject_limit as similar to other options?

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#13torikoshia
torikoshia@oss.nttdata.com
In reply to: Yugo NAGATA (#12)
2 attachment(s)
Re: Add reject_limit option to file_fdw

On 2024-11-13 23:17, Yugo NAGATA wrote:

On Wed, 13 Nov 2024 21:48:10 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-11-12 14:51, Yugo Nagata wrote:

Thanks for your review!

On Tue, 12 Nov 2024 10:16:50 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

On 2024-11-12 01:49, Fujii Masao wrote:

On 2024/11/11 21:45, torikoshia wrote:

Thanks for adding the comment. It clearly states that REJECT_LIMIT
can be
a single-quoted string. However, it might also be helpful to mention
that
it can be provided as an int64 in the COPY command option. How about
updating it like this?

------------------------------------
REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
command
option or as a single-quoted string for the foreign table option
using
file_fdw. Therefore this function needs to handle both formats.
------------------------------------

Thanks! it seems better.

Attached v3 patch.

Thanks for updating the patch! It looks like you forgot to attach it,
though.

Oops, thanks for pointing it out.
Here it is.

I have one minor comment.

+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data
type incompatibility",
+								(long long) cstate->opts.reject_limit)));

Do we have to keep this message consistent with ones of COPY command?
With foreign data wrappers, it seems common that the option name is
passed in
lowercase, so is it better to use "skipped more than reject_limit ..."
rather
than using uppercase?

Agreed.
Attached v4 patch.

Thank you for updating the patch.

However, I noticed that file_fdw could output the following error
using uppercase
when an invalid value is set to the reject_limit option,

ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '-1');
ERROR: REJECT_LIMIT (-1) must be greater than zero

as well as if reject_limit is set when on_error != 'ignore'.

ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE

Also, in the regression tests, I found similar behaviours on existing
options,
for example;

CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
'xml'); -- ERROR
ERROR: COPY format "xml" not recognized
CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
'text', quote ':'); -- ERROR
ERROR: COPY QUOTE requires CSV mode
CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
'text', escape ':'); -- ERROR
...

These messages may be unexpected for some users because the
documentation of
fild_fdw does not explicitly describe that file_fdw uses COPY
internally.
(I can find several wordings like "as COPY", though.)
However, since the current file_fdw already has such messages, I came
to think
making the messages on "reject_limit" consistent with COPY is
reasonable.
I mean, the previous version of the message seems fine.

Agreed. Thanks for your investigation.

As another comment, do we need to add validator test for on_error and
reject_limit as similar to other options?

That might be better.
Added some queries which had wrong options to cause errors.

I was unsure whether to add it to "validator test" section or "on_error,
log_verbosity and reject_limit tests" section, but I chose "validator
test" as I believe it makes things clearer.

Added 0002 patch since some of the tests are not related to reject_limit
but just on_error.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachments:

v5-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchtext/x-diff; name=v5-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patchDownload
From 661937869d696bc640ff54c97631c27671502e60 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 19 Nov 2024 21:06:37 +0900
Subject: [PATCH v5 1/2] file_fdw: Add reject_limit option to file_fdw

Commit 4ac2a9bece introduced REJECT_LIMIT option for COPY. This patch
extends support for this option to file_fdw.

As well as REJECT_LIMIT option for COPY, this option limits the
maximum number of erroneous rows that can be skipped.
If more rows encounter data type conversion errors than allowed by
reject_limit, the access to the file_fdw foreign table will fail
with an error, even when on_error = 'ignore'.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the
value for the foreign table's option must be single-quoted.
To accommodate this, the patch modifies defGetCopyRejectLimitOption
to accept not only int64 but also strings.
---
 contrib/file_fdw/data/agg.bad          |  1 +
 contrib/file_fdw/expected/file_fdw.out | 18 ++++++++++++++++--
 contrib/file_fdw/file_fdw.c            |  8 ++++++++
 contrib/file_fdw/sql/file_fdw.sql      |  7 ++++++-
 doc/src/sgml/file-fdw.sgml             | 12 ++++++++++++
 src/backend/commands/copy.c            | 16 +++++++++++++++-
 6 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/contrib/file_fdw/data/agg.bad b/contrib/file_fdw/data/agg.bad
index 3415b15007..04279ce55b 100644
--- a/contrib/file_fdw/data/agg.bad
+++ b/contrib/file_fdw/data/agg.bad
@@ -2,3 +2,4 @@
 100;@99.097@
 0;@aaa@
 42;@324.78@
+1;@bbb@
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 593fdc782e..4f63c047ec 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -90,6 +90,8 @@ ERROR:  COPY delimiter cannot be newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 ERROR:  either filename or program is required for file_fdw foreign tables
 \set filename :abs_srcdir '/data/agg.data'
@@ -206,10 +208,10 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 SELECT * FROM agg_bad;               -- ERROR
 ERROR:  invalid input syntax for type real: "aaa"
 CONTEXT:  COPY agg_bad, line 3, column b: "aaa"
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
-NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  2 rows were skipped due to data type incompatibility
   a  |   b    
 -----+--------
  100 | 99.097
@@ -224,6 +226,18 @@ SELECT * FROM agg_bad;
   42 | 324.78
 (2 rows)
 
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ERROR:  skipped more than REJECT_LIMIT (1) rows due to data type incompatibility
+CONTEXT:  COPY agg_bad, line 5, column b: "bbb"
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
 ANALYZE agg_bad;
 -- misc query tests
 \t on
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 043204c3e7..9e2896f32a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -77,6 +77,7 @@ static const struct FileFdwOption valid_options[] = {
 	{"encoding", ForeignTableRelationId},
 	{"on_error", ForeignTableRelationId},
 	{"log_verbosity", ForeignTableRelationId},
+	{"reject_limit", ForeignTableRelationId},
 	{"force_not_null", AttributeRelationId},
 	{"force_null", AttributeRelationId},
 
@@ -788,6 +789,13 @@ retry:
 			 */
 			ResetPerTupleExprContext(estate);
 
+			if (cstate->opts.reject_limit > 0 &&
+				cstate->num_errors > cstate->opts.reject_limit)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+								(long long) cstate->opts.reject_limit)));
+
 			/* Repeat NextCopyFrom() until no soft error occurs */
 			goto retry;
 		}
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index edd77c5cd2..4805ca8c04 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -77,6 +77,7 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', delimiter
 ');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 
 \set filename :abs_srcdir '/data/agg.data'
@@ -150,11 +151,15 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 -- error context report tests
 SELECT * FROM agg_bad;               -- ERROR
 
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
 SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
 ANALYZE agg_bad;
 
 -- misc query tests
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index bb3579b077..882d9a76d2 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -138,6 +138,18 @@
    </listitem>
   </varlistentry>
 
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+
   <varlistentry>
    <term><literal>log_verbosity</literal></term>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..2d98ecf3f4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -420,11 +420,25 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 
 /*
  * Extract REJECT_LIMIT value from a DefElem.
+ *
+ * REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
+ * option or as a single-quoted string for the foreign table option using
+ * file_fdw. Therefore this function needs to handle both formats.
  */
 static int64
 defGetCopyRejectLimitOption(DefElem *def)
 {
-	int64		reject_limit = defGetInt64(def);
+	int64		reject_limit;
+
+	if (def->arg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a numeric value",
+						def->defname)));
+	else if (nodeTag(def->arg) == T_String)
+		reject_limit = pg_strtoint64(strVal(def->arg));
+	else
+		reject_limit = defGetInt64(def);
 
 	if (reject_limit <= 0)
 		ereport(ERROR,
-- 
2.39.2

v5-0002-Add-validator-tests-for-on_error-options.patchtext/x-diff; name=v5-0002-Add-validator-tests-for-on_error-options.patchDownload
From ab3c38571761507a9e1b3073ffa9c2fdbc50c7de Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 19 Nov 2024 21:12:52 +0900
Subject: [PATCH v5 2/2] Add validator tests for on_error options

---
 contrib/file_fdw/expected/file_fdw.out | 4 ++++
 contrib/file_fdw/sql/file_fdw.sql      | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 4f63c047ec..f383aec6d8 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -90,6 +90,10 @@ ERROR:  COPY delimiter cannot be newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'aaa');       -- ERROR
+ERROR:  COPY ON_ERROR "aaa" not recognized
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore');       -- ERROR
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
 ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index 4805ca8c04..1a76aebde6 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -77,6 +77,8 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', delimiter
 ');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'aaa');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 
-- 
2.39.2

#14Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: torikoshia (#13)
1 attachment(s)
Re: Add reject_limit option to file_fdw

On 2024/11/19 21:40, torikoshia wrote:

These messages may be unexpected for some users because the documentation of
fild_fdw does not explicitly describe that file_fdw uses COPY internally.
(I can find several wordings like "as COPY", though.)
However, since the current file_fdw already has such messages, I came to think
making the messages on "reject_limit" consistent with COPY is reasonable.
I mean, the previous version of the message seems fine.

Agreed. Thanks for your investigation.

I've pushed the 0001 patch. Thanks!

As another comment, do we need to add validator test for on_error and
reject_limit as similar to other options?

That might be better.
Added some queries which had wrong options to cause errors.

I was unsure whether to add it to "validator test" section or "on_error, log_verbosity and reject_limit tests" section, but I chose "validator test" as I believe it makes things clearer.

Added 0002 patch since some of the tests are not related to reject_limit but just on_error.

How about adding validator tests for log_verbosity and reject_limit=0,
similar to the tests in the copy2.sql regression test? I've added those
two tests, and the latest version of the patch is attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v6-0001-file_fdw-Add-regression-tests-for-ON_ERROR-and-ot.patchtext/plain; charset=UTF-8; name=v6-0001-file_fdw-Add-regression-tests-for-ON_ERROR-and-ot.patchDownload
From b73cf7b8aec5d93490f0d303eda3207bd4acbf8c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 19 Nov 2024 21:12:52 +0900
Subject: [PATCH v6] file_fdw: Add regression tests for ON_ERROR and other
 options.

This commit introduces regression tests to validate incorrect settings
for the ON_ERROR, LOG_VERBOSITY, and REJECT_LIMIT options in file_fdw.
---
 contrib/file_fdw/expected/file_fdw.out | 8 ++++++++
 contrib/file_fdw/sql/file_fdw.sql      | 4 ++++
 2 files changed, 12 insertions(+)

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 4f63c047ec..df8d43b374 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -90,8 +90,16 @@ ERROR:  COPY delimiter cannot be newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'unsupported');       -- ERROR
+ERROR:  COPY ON_ERROR "unsupported" not recognized
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore');       -- ERROR
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (log_verbosity 'unsupported');       -- ERROR
+ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
 ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'ignore', reject_limit '0');       -- ERROR
+ERROR:  REJECT_LIMIT (0) must be greater than zero
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 ERROR:  either filename or program is required for file_fdw foreign tables
 \set filename :abs_srcdir '/data/agg.data'
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index 4805ca8c04..2cdbe7a8a4 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -77,7 +77,11 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', delimiter
 ');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'unsupported');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (log_verbosity 'unsupported');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'ignore', reject_limit '0');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 
 \set filename :abs_srcdir '/data/agg.data'
-- 
2.47.0

#15torikoshia
torikoshia@oss.nttdata.com
In reply to: Fujii Masao (#14)
Re: Add reject_limit option to file_fdw

On 2024-11-21 00:43, Fujii Masao wrote:

On 2024/11/19 21:40, torikoshia wrote:

These messages may be unexpected for some users because the
documentation of
fild_fdw does not explicitly describe that file_fdw uses COPY
internally.
(I can find several wordings like "as COPY", though.)
However, since the current file_fdw already has such messages, I came
to think
making the messages on "reject_limit" consistent with COPY is
reasonable.
I mean, the previous version of the message seems fine.

Agreed. Thanks for your investigation.

I've pushed the 0001 patch. Thanks!

Thanks a lot!

As another comment, do we need to add validator test for on_error and
reject_limit as similar to other options?

That might be better.
Added some queries which had wrong options to cause errors.

I was unsure whether to add it to "validator test" section or
"on_error, log_verbosity and reject_limit tests" section, but I chose
"validator test" as I believe it makes things clearer.

Added 0002 patch since some of the tests are not related to
reject_limit but just on_error.

How about adding validator tests for log_verbosity and reject_limit=0,
similar to the tests in the copy2.sql regression test? I've added those
two tests, and the latest version of the patch is attached.

Thanks for the update! Agreed.
I think the validator tests related to the on_error option you added are
sufficient.

As for the other file_fdw options, I'm a bit concerned whether it would
be better to add validator tests for cases like when 'encoding' is
'unsupported', but that should be discussed in another thread.

Regards,

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

#16Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: torikoshia (#15)
Re: Add reject_limit option to file_fdw

On 2024/11/21 15:43, torikoshia wrote:

Thanks for the update! Agreed.
I think the validator tests related to the on_error option you added are sufficient.

I've pushed the patch. Thanks!

As for the other file_fdw options, I'm a bit concerned whether it would be better to add validator tests for cases like when 'encoding' is 'unsupported', but that should be discussed in another thread.

I'm not sure how valuable it is to add that test, but yes,
it would be better to start a new thread for discussion if needed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION