Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Hi,
The option choice of "ignore" in the COPY ON_ERROR clause seems overly
generic. There would seem to be two relevant ways to ignore bad column
input data - drop the entire row or just set the column value to null. I
can see us wanting to provide the set to null option and in any case having
the option name be explicit that it ignores the row seems like a good idea.
David J.
On Fri, Jan 26, 2024 at 11:09 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
Hi,
The option choice of "ignore" in the COPY ON_ERROR clause seems overly generic. There would seem to be two relevant ways to ignore bad column input data - drop the entire row or just set the column value to null. I can see us wanting to provide the set to null option and in any case having the option name be explicit that it ignores the row seems like a good idea.
two issue I found out while playing around with it;
create table x1(a int not null, b int not null );
you can only do:
COPY x1 from stdin (on_error 'null');
but you cannot do
COPY x1 from stdin (on_error null);
we need to hack the gram.y to escape the "null". I don't know how to
make it work.
related post I found:
https://stackoverflow.com/questions/31786611/how-to-escape-flex-keyword
another issue:
COPY x1 from stdin (on_error null);
when we already have `not null` top level constraint for table x1.
Do we need an error immediately?
"on_error null" seems to conflict with `not null` constraint (assume
refers to the same column).
it may fail while doing bulk inserts while on_error is set to null
because of violating a not null constraint.
On Sun, Jan 28, 2024 at 4:51 PM jian he <jian.universality@gmail.com> wrote:
On Fri, Jan 26, 2024 at 11:09 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:Hi,
The option choice of "ignore" in the COPY ON_ERROR clause seems overly
generic. There would seem to be two relevant ways to ignore bad column
input data - drop the entire row or just set the column value to null. I
can see us wanting to provide the set to null option and in any case having
the option name be explicit that it ignores the row seems like a good idea.two issue I found out while playing around with it;
create table x1(a int not null, b int not null );another issue:
COPY x1 from stdin (on_error null);when we already have `not null` top level constraint for table x1.
Do we need an error immediately?
"on_error null" seems to conflict with `not null` constraint (assume
refers to the same column).
it may fail while doing bulk inserts while on_error is set to null
because of violating a not null constraint.
You should not error immediately since whether or not there is a problem is
table and data dependent. I would not check for the case of all columns
being defined not null and just let the mismatch happen.
That said, maybe with this being a string we can accept something like:
'null, ignore'
And so if attempting to place any one null fails, assuming we can make that
a soft error too, we would then ignore the entire row.
David J.
On Fri, 26 Jan 2024 08:08:29 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:
Hi,
The option choice of "ignore" in the COPY ON_ERROR clause seems overly
generic. There would seem to be two relevant ways to ignore bad column
input data - drop the entire row or just set the column value to null. I
can see us wanting to provide the set to null option and in any case having
the option name be explicit that it ignores the row seems like a good idea.
I am not in favour of renaming the option name "ignore", instead we can
use another style of name for the option to set the column value to NULL,
for example, "set_to_null".
(Maybe, we can make a more generic option "set_to (col, val)" that can set
the value of column specified by "col" value to the specified value "val"
(e.g. 'N/A') on a soft error, although the syntax would be a bit complex...)
IMO, it is more simple to define "ignore" as to skip the entire row rather
than having variety of "ignore". Once defined it so, the option to set the
column value to NULL should not be called "ignore" because values in other
columns will be inserted.
Regards,
Yugo Nagata
David J.
--
Yugo NAGATA <nagata@sraoss.co.jp>
The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')
Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the keyword NULL should be single quoted.
demo:
COPY check_ign_err FROM STDIN WITH (on_error 'null');
1 {1} a
2 {2} 1
3 {3} 2
4 {4} b
a {5} c
\.
\pset null NULL
SELECT * FROM check_ign_err;
n | m | k
------+-----+------
1 | {1} | NULL
2 | {2} | 1
3 | {3} | 2
4 | {4} | NULL
NULL | {5} | NULL
Attachments:
v1-0001-introduce-copy-on_error-null-option.patchtext/x-patch; charset=US-ASCII; name=v1-0001-introduce-copy-on_error-null-option.patchDownload
From 19afa942af22fd3d2ed2436c6bc7ce02f00bb570 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 3 Feb 2024 14:04:08 +0800
Subject: [PATCH v1 1/1] introduce copy on_error 'null' option
on_error 'null', null needs single quoted.
any data type conversion error will treat that column value to be NULL.
it will not work with column have not null constraint,
we check this at BeginCopyFrom.
discussion: https://www.postgresql.org/message-id/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 1 +
src/backend/commands/copy.c | 2 ++
src/backend/commands/copyfrom.c | 28 ++++++++++++++++++++++++----
src/backend/commands/copyfromparse.c | 27 +++++++++++++++++++++++++--
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 26 ++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 28 ++++++++++++++++++++++++++++
7 files changed, 107 insertions(+), 6 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 55764fc1..d3c4ebdc 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -390,6 +390,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
<literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>null</literal> means the input value will be <literal>null</literal> and continue with the next one.
The default is <literal>stop</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6..01ce47b0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -422,6 +422,8 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b91..f4c5704e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1005,6 +1005,7 @@ CopyFrom(CopyFromState cstate)
* information according to ON_ERROR.
*/
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ {
/*
* Just make ErrorSaveContext ready for the next NextCopyFrom.
@@ -1013,11 +1014,18 @@ CopyFrom(CopyFromState cstate)
*/
cstate->escontext->error_occurred = false;
- /* Report that this tuple was skipped by the ON_ERROR clause */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
- ++skipped);
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ ++skipped);
- continue;
+ continue;
+ }
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ *
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ cstate->escontext->error_occurred = false;
}
ExecStoreVirtualTuple(myslot);
@@ -1313,6 +1321,7 @@ CopyFrom(CopyFromState cstate)
error_context_stack = errcallback.previous;
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ cstate->opts.on_error != COPY_ON_ERROR_NULL &&
cstate->num_errors > 0)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
@@ -1468,6 +1477,8 @@ BeginCopyFrom(ParseState *pstate,
*/
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
cstate->escontext->details_wanted = false;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ cstate->escontext->details_wanted = false;
}
else
cstate->escontext = NULL;
@@ -1621,6 +1632,15 @@ BeginCopyFrom(ParseState *pstate,
if (att->attisdropped)
continue;
+ /*
+ * we can specify on_error 'null', but it can only apply to columns
+ * don't have not null constraint.
+ */
+ if (att->attnotnull && cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("copy on_error 'null' cannot be used with not null constraint column")));
+
/* Fetch the input function and typioparam info */
if (cstate->opts.binary)
getTypeBinaryInputInfo(att->atttypid,
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b7..9475a9dc 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -881,6 +881,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool error_happened = false;
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -968,14 +969,36 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
(Node *) cstate->escontext,
&values[m]))
{
- cstate->num_errors++;
- return true;
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+
+ values[m] = (Datum) 0;
+ nulls[m] = true;
+ /* here, we need set error_occurred to false, so COPY will be continue */
+ cstate->escontext->error_occurred = false;
+
+ if (!error_happened)
+ error_happened = true;
+ }
+ else
+ {
+ cstate->num_errors++;
+ return true;
+ }
+
}
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* accumate num_errors. one row multiple errors only count 1*/
+ if (error_happened)
+ {
+ cstate->num_errors++;
+ cstate->escontext->error_occurred = true;
+ }
+
Assert(fieldno == attr_count);
}
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0..6c3733fa 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* transform error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 25c401ce..0038d6c8 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -751,6 +751,31 @@ CONTEXT: COPY check_ign_err, line 1: "1 {1}"
COPY check_ign_err FROM STDIN WITH (on_error ignore);
ERROR: extra data after last expected column
CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error 'null');
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error 'null');
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1,"
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error 'null');
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
+CREATE temp TABLE check_on_err_null (n int, m int not null, k int);
+--should fail. since not null constraint conflict with on_error null
+COPY check_on_err_null FROM STDIN WITH (on_error 'null');
+ERROR: copy on_error 'null' cannot be used with not null constraint column
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -767,6 +792,7 @@ DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
DROP TABLE hard_err;
+DROP TABLE check_on_err_null;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b5e549e8..dacde209 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -534,6 +534,33 @@ COPY check_ign_err FROM STDIN WITH (on_error ignore);
1 {1} 3 abc
\.
+
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error 'null');
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error 'null');
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error 'null');
+1,{1},1,1
+\.
+
+CREATE temp TABLE check_on_err_null (n int, m int not null, k int);
+--should fail. since not null constraint conflict with on_error null
+COPY check_on_err_null FROM STDIN WITH (on_error 'null');
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -550,6 +577,7 @@ DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
DROP TABLE hard_err;
+DROP TABLE check_on_err_null;
--
-- COPY FROM ... DEFAULT
--
2.34.1
Hi,
On 2024-02-03 15:22, jian he wrote:
The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')
+ /* + * we can specify on_error 'null', but it can only apply to columns + * don't have not null constraint. + */ + if (att->attnotnull && cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("copy on_error 'null' cannot be used with not null constraint column")));
This means we cannot use ON_ERROR 'null' even when there is one column
which have NOT NULL constraint, i.e. primary key, right?
IMHO this is strong constraint and will decrease the opportunity to use
this feature.
It might be better to allow error_action 'null' for tables which have
NOT NULL constraint columns, and when facing soft errors for those rows,
skip that row or stop COPY.
Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the keyword NULL should be single quoted.
As you mentioned, single quotation seems a little odd..
I'm not sure what is the best name and syntax for this feature, but
since current error_action are verbs('stop' and 'ignore'), I feel 'null'
might not be appropriate.
demo:
COPY check_ign_err FROM STDIN WITH (on_error 'null');
1 {1} a
2 {2} 1
3 {3} 2
4 {4} b
a {5} c
\.\pset null NULL
SELECT * FROM check_ign_err;
n | m | k
------+-----+------
1 | {1} | NULL
2 | {2} | 1
3 | {3} | 2
4 | {4} | NULL
NULL | {5} | NULL
Since we notice the number of ignored rows when ON_ERROR is 'ignore',
users may want to know the number of rows which was changed to NULL when
using ON_ERROR 'null'.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
On Mon, Feb 5, 2024 at 10:29 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Hi,
On 2024-02-03 15:22, jian he wrote:
The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')+ /* + * we can specify on_error 'null', but it can only apply to columns + * don't have not null constraint. + */ + if (att->attnotnull && cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("copy on_error 'null' cannot be used with not null constraint column")));This means we cannot use ON_ERROR 'null' even when there is one column
which have NOT NULL constraint, i.e. primary key, right?
IMHO this is strong constraint and will decrease the opportunity to use
this feature.
I don't want to fail in the middle of bulk inserts,
so I thought immediately erroring out would be a great idea.
Let's see what other people think.
On Mon, 05 Feb 2024 11:28:59 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:
Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the keyword NULL should be single quoted.As you mentioned, single quotation seems a little odd..
I'm not sure what is the best name and syntax for this feature, but
since current error_action are verbs('stop' and 'ignore'), I feel 'null'
might not be appropriate.
I am not in favour of using 'null' either, so I suggested to use
"set_to_null" or more generic syntax like "set_to (col, val)" in my
previous post[1]/messages/by-id/20240129172858.ccb6c77c3be95a295e2b2b44@sraoss.co.jp, although I'm not convinced what is the best either.
[1]: /messages/by-id/20240129172858.ccb6c77c3be95a295e2b2b44@sraoss.co.jp
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in
On Mon, 05 Feb 2024 11:28:59 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the keyword NULL should be single quoted.As you mentioned, single quotation seems a little odd..
I'm not sure what is the best name and syntax for this feature, but
since current error_action are verbs('stop' and 'ignore'), I feel 'null'
might not be appropriate.I am not in favour of using 'null' either, so I suggested to use
"set_to_null" or more generic syntax like "set_to (col, val)" in my
previous post[1], although I'm not convinced what is the best either.[1] /messages/by-id/20240129172858.ccb6c77c3be95a295e2b2b44@sraoss.co.jp
Tom sugggested using a separate option, and I agree with the
suggestion. Taking this into consideration, I imagined something like
the following, for example. Although I'm not sure we are actually
going to do whole-tuple replacement, the action name in this example
has the suffix '-column'.
COPY (on_error 'replace-colomn', replacement 'null') ..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, 06 Feb 2024 09:39:09 +0900 (JST)
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in
On Mon, 05 Feb 2024 11:28:59 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the keyword NULL should be single quoted.As you mentioned, single quotation seems a little odd..
I'm not sure what is the best name and syntax for this feature, but
since current error_action are verbs('stop' and 'ignore'), I feel 'null'
might not be appropriate.I am not in favour of using 'null' either, so I suggested to use
"set_to_null" or more generic syntax like "set_to (col, val)" in my
previous post[1], although I'm not convinced what is the best either.[1] /messages/by-id/20240129172858.ccb6c77c3be95a295e2b2b44@sraoss.co.jp
Tom sugggested using a separate option, and I agree with the
suggestion. Taking this into consideration, I imagined something like
the following, for example. Although I'm not sure we are actually
going to do whole-tuple replacement, the action name in this example
has the suffix '-column'.COPY (on_error 'replace-colomn', replacement 'null') ..
Thank you for your information. I've found a post[1]/messages/by-id/2070915.1705527477@sss.pgh.pa.us you mentioned,
where adding a separate option for error log destination was suggested.
Considering consistency with other options, adding a separate option
would be better if we want to specify a value to replace the invalid
value, without introducing a complex syntax that allows options with
more than one parameters. Maybe, if we allow to use values for the
replacement other than NULL, we have to also add a option to specify
a column (or a type) for each replacement value. Or, we may add a
option to specify a list of replacement values as many as the number of
columns, each of whose default is NULL.
Anyway, I prefer 'replace" (or 'set_to') to just 'null' as the option
value.
[1]: /messages/by-id/2070915.1705527477@sss.pgh.pa.us
Regards,
Yugo Nagata
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Tue, Feb 6, 2024 at 3:46 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Tue, 06 Feb 2024 09:39:09 +0900 (JST)
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in
On Mon, 05 Feb 2024 11:28:59 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the keyword NULL should be single quoted.As you mentioned, single quotation seems a little odd..
I'm not sure what is the best name and syntax for this feature, but
since current error_action are verbs('stop' and 'ignore'), I feel 'null'
might not be appropriate.I am not in favour of using 'null' either, so I suggested to use
"set_to_null" or more generic syntax like "set_to (col, val)" in my
previous post[1], although I'm not convinced what is the best either.[1] /messages/by-id/20240129172858.ccb6c77c3be95a295e2b2b44@sraoss.co.jp
Tom sugggested using a separate option, and I agree with the
suggestion. Taking this into consideration, I imagined something like
the following, for example. Although I'm not sure we are actually
going to do whole-tuple replacement, the action name in this example
has the suffix '-column'.COPY (on_error 'replace-colomn', replacement 'null') ..
Thank you for your information. I've found a post[1] you mentioned,
where adding a separate option for error log destination was suggested.Considering consistency with other options, adding a separate option
would be better if we want to specify a value to replace the invalid
value, without introducing a complex syntax that allows options with
more than one parameters. Maybe, if we allow to use values for the
replacement other than NULL, we have to also add a option to specify
a column (or a type) for each replacement value. Or, we may add a
option to specify a list of replacement values as many as the number of
columns, each of whose default is NULL.Anyway, I prefer 'replace" (or 'set_to') to just 'null' as the option
value.
Let's say tabe t column (a,b,c)
if we support set_to_null(a,b), what should we do if column c has an error.
should we ignore this row or error out immediately?
also I am not sure it's doable to just extract columnList from the
function defGetCopyOnErrorChoice.
to make `COPY x from stdin (on_error set_to_null(a,b);` work,
we may need to refactor to gram.y, in a similar way we do force null
i am ok with
COPY x from stdin (on_error set_to_null);
On Mon, 5 Feb 2024 14:26:46 +0800
jian he <jian.universality@gmail.com> wrote:
On Mon, Feb 5, 2024 at 10:29 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Hi,
On 2024-02-03 15:22, jian he wrote:
The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')+ /* + * we can specify on_error 'null', but it can only apply to columns + * don't have not null constraint. + */ + if (att->attnotnull && cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("copy on_error 'null' cannot be used with not null constraint column")));This means we cannot use ON_ERROR 'null' even when there is one column
which have NOT NULL constraint, i.e. primary key, right?
IMHO this is strong constraint and will decrease the opportunity to use
this feature.I don't want to fail in the middle of bulk inserts,
so I thought immediately erroring out would be a great idea.
Let's see what other people think.
I also think this restriction is too strong because it is very
common that a table has a primary key, unless there is some way
to specify columns that can be set to NULL. Even when ON_ERROR
is specified, any constraint violation errors cannot be generally
ignored, so we cannot elimiate the posibility COPY FROM fails in
the middle due to invalid data, anyway.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
attached v2.
syntax: `on_error set_to_null`
based on upthread discussion, now if you specified `on_error
set_to_null` and your column has `not
null` constraint, we convert the error field to null, so it may error
while bulk inserting for violating NOT NULL constraint.
Attachments:
v2-0001-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v2-0001-on_error-set_to_null.patchDownload
From c95bb7b7c072f510b9a60695714be21345f21591 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 10 Feb 2024 15:08:41 +0800
Subject: [PATCH v2 1/1] on_error set_to_null
any data type conversion errors while COPY FROM will set that column value to be NULL.
discussion: https://www.postgresql.org/message-id/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 1 +
src/backend/commands/copy.c | 2 ++
src/backend/commands/copyfrom.c | 30 ++++++++++++++++++++++------
src/backend/commands/copyfromparse.c | 28 ++++++++++++++++++++++++--
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 22 ++++++++++++++++++++
src/test/regress/sql/copy2.sql | 23 +++++++++++++++++++++
7 files changed, 99 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 55764fc1..d8b609b6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -390,6 +390,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
<literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>set_to_null</literal> means the input value will set to <literal>null</literal> and continue with the next field.
The default is <literal>stop</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6..9c7d6ebd 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -422,6 +422,8 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 41f6bc43..2a87bcf3 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1005,6 +1005,7 @@ CopyFrom(CopyFromState cstate)
* information according to ON_ERROR.
*/
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ {
/*
* Just make ErrorSaveContext ready for the next NextCopyFrom.
@@ -1013,11 +1014,18 @@ CopyFrom(CopyFromState cstate)
*/
cstate->escontext->error_occurred = false;
- /* Report that this tuple was skipped by the ON_ERROR clause */
- pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
- ++skipped);
+ /* Report that this tuple was skipped by the ON_ERROR clause */
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ ++skipped);
- continue;
+ continue;
+ }
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ *
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ cstate->escontext->error_occurred = false;
}
ExecStoreVirtualTuple(myslot);
@@ -1312,7 +1320,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->num_errors > 0)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
@@ -1320,6 +1328,14 @@ CopyFrom(CopyFromState cstate)
(unsigned long long) cstate->num_errors,
(unsigned long long) cstate->num_errors));
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL &&
+ cstate->num_errors > 0)
+ ereport(NOTICE,
+ errmsg_plural("some columns of %llu rows, value was converted to NULL due to data type incompatibility",
+ "some columns of %llu rows, value were converted to NULL due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1463,11 +1479,13 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL. We'll add other
* options later
*/
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
cstate->escontext->details_wanted = false;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ cstate->escontext->details_wanted = false;
}
else
cstate->escontext = NULL;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 90675636..9d77c3d1 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -873,6 +873,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool error_happened = false;
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -960,14 +961,37 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
(Node *) cstate->escontext,
&values[m]))
{
- cstate->num_errors++;
- return true;
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+
+ values[m] = (Datum) 0;
+ nulls[m] = true;
+ /* here, we need set error_occurred to false, so COPY will be continue */
+ cstate->escontext->error_occurred = false;
+
+ /* does any conversion error ever happened in all the fields */
+ if (!error_happened)
+ error_happened = true;
+ }
+ else
+ {
+ cstate->num_errors++;
+ return true;
+ }
+
}
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* update num_errors. one row with multiple errors field only count 1*/
+ if (error_happened)
+ {
+ cstate->num_errors++;
+ cstate->escontext->error_occurred = true;
+ }
+
Assert(fieldno == attr_count);
}
else
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0..931fe09b 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 25c401ce..879da283 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -751,6 +751,28 @@ CONTEXT: COPY check_ign_err, line 1: "1 {1}"
COPY check_ign_err FROM STDIN WITH (on_error ignore);
ERROR: extra data after last expected column
CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+NOTICE: some columns of 3 rows, value were converted to NULL due to data type incompatibility
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1,"
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b5e549e8..67bf45a7 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -534,6 +534,29 @@ COPY check_ign_err FROM STDIN WITH (on_error ignore);
1 {1} 3 abc
\.
+
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.34.1
Hi!
On 12.02.24 01:00, jian he wrote:
attached v2.
syntax: `on_error set_to_null`
based on upthread discussion, now if you specified `on_error
set_to_null` and your column has `not
null` constraint, we convert the error field to null, so it may error
while bulk inserting for violating NOT NULL constraint.
That's a very nice feature. Thanks for implementing it!
v2 applies cleanly and works as described.
\pset null '(NULL)'
CREATE TEMPORARY TABLE t1 (a int, b int);
COPY t1 (a,b) FROM STDIN;
1 a
2 1
3 2
4 b
a c
\.
SELECT * FROM t1;
CONTEXT: COPY t1, line 1, column b: "a"
a | b
---+---
(0 rows)
CREATE TEMPORARY TABLE t2 (a int, b int);
COPY t2 (a,b) FROM STDIN WITH (on_error set_to_null);
1 a
2 1
3 2
4 b
a c
\.
SELECT * FROM t2;
psql:test-copy-on_error-2.sql:12: NOTICE: some columns of 3 rows, value
were converted to NULL due to data type incompatibility
COPY 5
a | b
--------+--------
1 | (NULL)
2 | 1
3 | 2
4 | (NULL)
(NULL) | (NULL)
(5 rows)
I have one question though:
In case all columns of a record have been set to null due to data type
incompatibility, should we insert it at all? See t2 example above.
I'm not sure if these records would be of any use in the table. What do
you think?
Since the parameter is already called "set_to_null", maybe it is not
necessary to mention in the NOTICE message that the values have been set
to null.
Perhaps something like "XX records were only partially copied due to
data type incompatibility"
--
Jim
On Fri, Feb 16, 2024 at 1:16 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
In case all columns of a record have been set to null due to data type
incompatibility, should we insert it at all?
Yes. In particular not all columns in the table need be specified in the
copy command so while the parsed input data is all nulls the record itself
may not be.
The system should allow the user to exclude rows with incomplete data by
ignoring a not null constraint violation.
In short we shouldn't judge non-usefulness and instead give tools to the
user to decide for themselves.
David J.
On 16.02.24 21:31, David G. Johnston wrote:
Yes. In particular not all columns in the table need be specified in
the copy command so while the parsed input data is all nulls the
record itself may not be.
Yeah, you have a point there.
I guess if users want to avoid it to happen they can rely on NOT NULL
constraints.
Thanks
--
Jim
hi all.
patch updated.
simplified the code a lot.
idea is same:
COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
If the STDIN number of columns is the same as the target table, then
InputFunctionCallSafe
call failure will make that column values be null.
If the STDIN number of columns is not the same as the target table, then error
ERROR: missing data for column \"%s\"
ERROR: extra data after last expected column
which is status quo.
Attachments:
v3-0001-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v3-0001-on_error-set_to_null.patchDownload
From d697684b4dc1356172d93179b1e5e157893c3e54 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 23 Aug 2024 22:26:44 +0800
Subject: [PATCH v3 1/1] on_error set_to_null
any data type conversion errors while COPY FROM will set that column value to be NULL.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 1 +
src/backend/commands/copy.c | 2 ++
src/backend/commands/copyfrom.c | 8 ++++--
src/backend/commands/copyfromparse.c | 11 ++++++++
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 39 ++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 41 ++++++++++++++++++++++++++++
7 files changed, 100 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..b6bdf45e7e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,6 +394,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
<literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>set_to_null</literal> means the input value will set to <literal>null</literal> and continue with the next one.
The default is <literal>stop</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..e4bd310ae5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -409,6 +409,8 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..c1e58e49bc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1319,7 +1319,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->num_errors > 0)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
@@ -1471,11 +1471,13 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
cstate->escontext->details_wanted = false;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ cstate->escontext->details_wanted = false;
}
else
cstate->escontext = NULL;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7efcb89159..6fbe975b51 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -969,6 +969,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ values[m] = (Datum) 0;
+ nulls[m] = true;
+ /*
+ * set error_occurred to false, so next
+ * InputFunctionCallSafe call behave sane.
+ */
+ cstate->escontext->error_occurred = false;
+ continue;
+ }
cstate->num_errors++;
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..fa87232ed7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e913f683a6..4d23527106 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -753,6 +753,24 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column k: "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: null value in column "c" of relation "t_on_error_null" violates not-null constraint
+DETAIL: Failing row contains (11, null, 12).
+CONTEXT: COPY t_on_error_null, line 1: "11 a 12"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+select * from t_on_error_null;
+ a | c | b
+---+----+----
+ | 11 | 13
+ | 11 | 14
+(2 rows)
+
+drop table t_on_error_null;
+drop domain d_int_not_null;
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -789,6 +807,27 @@ CONTEXT: COPY check_ign_err, line 1: "1 {1}"
COPY check_ign_err FROM STDIN WITH (on_error ignore);
ERROR: extra data after last expected column
CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..4abc18a6db 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -526,6 +526,24 @@ a {2} 2
8 {8} 8
\.
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+a 11 14
+\.
+
+select * from t_on_error_null;
+drop table t_on_error_null;
+drop domain d_int_not_null;
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -557,6 +575,29 @@ COPY check_ign_err FROM STDIN WITH (on_error ignore);
1 {1} 3 abc
\.
+
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.34.1
Hi there
On 26.08.24 02:00, jian he wrote:
hi all.
patch updated.
simplified the code a lot.idea is same:
COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);If the STDIN number of columns is the same as the target table, then
InputFunctionCallSafe
call failure will make that column values be null.If the STDIN number of columns is not the same as the target table, then error
ERROR: missing data for column \"%s\"
ERROR: extra data after last expected column
which is status quo.
I wanted to give it another try, but the patch does not apply ...
$ git apply ~/patches/copy_on_error/v3-0001-on_error-set_to_null.patch -v
Checking patch doc/src/sgml/ref/copy.sgml...
Checking patch src/backend/commands/copy.c...
Checking patch src/backend/commands/copyfrom.c...
Checking patch src/backend/commands/copyfromparse.c...
Checking patch src/include/commands/copy.h...
Checking patch src/test/regress/expected/copy2.out...
error: while searching for:
NOTICE: skipping row due to data type incompatibility at line 8 for
column k: "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
-- tests for on_error option with log_verbosity and null constraint via
domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
error: patch failed: src/test/regress/expected/copy2.out:753
error: src/test/regress/expected/copy2.out: patch does not apply
Checking patch src/test/regress/sql/copy2.sql...
--
Jim
On Mon, Sep 9, 2024 at 10:34 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
Hi there
On 26.08.24 02:00, jian he wrote:
hi all.
patch updated.
simplified the code a lot.idea is same:
COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);If the STDIN number of columns is the same as the target table, then
InputFunctionCallSafe
call failure will make that column values be null.If the STDIN number of columns is not the same as the target table, then error
ERROR: missing data for column \"%s\"
ERROR: extra data after last expected column
which is status quo.I wanted to give it another try, but the patch does not apply ...
here we are.
please check the attached file.
Attachments:
v4-0001-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v4-0001-on_error-set_to_null.patchDownload
From 3d6b3d8b0393b5bc4950e85c40c69c0da46c8035 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 12 Sep 2024 17:07:02 +0800
Subject: [PATCH v4 1/1] on_error set_to_null
extent "on_error action", introduce new option: on_error set_to_null.
Due to current grammar, we cannot use "on_error null",
so I choose on_error set_to_null.
any data type conversion errors while the COPY FROM process will set that column value to be NULL.
this will only work with COPY FROM and non-binary format.
However this will respect the not-null constraint, meaning, if you actually converted error to null,
but the column has not-null constraint, not-null constraint violation ERROR will be raised.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 1 +
src/backend/commands/copy.c | 4 ++-
src/backend/commands/copyfrom.c | 9 ++---
src/backend/commands/copyfromparse.c | 11 +++++++
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 49 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 44 +++++++++++++++++++++++++
7 files changed, 114 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..b6bdf45e7e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,6 +394,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
<literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>set_to_null</literal> means the input value will set to <literal>null</literal> and continue with the next one.
The default is <literal>stop</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..05b152a090 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", or "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..1669fac444 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1319,7 +1319,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->num_errors > 0)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
@@ -1471,10 +1471,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 97a4c387a3..3fe32b76ac 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -969,6 +969,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ values[m] = (Datum) 0;
+ nulls[m] = true;
+ /*
+ * set error_occurred to false, so next
+ * InputFunctionCallSafe call behave sane.
+ */
+ cstate->escontext->error_occurred = false;
+ continue;
+ }
cstate->num_errors++;
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..fa87232ed7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 61a19cdc4c..b92a5771ff 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,8 @@ COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -112,6 +118,10 @@ COPY x to stdin (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
@@ -753,6 +763,24 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: null value in column "c" of relation "t_on_error_null" violates not-null constraint
+DETAIL: Failing row contains (11, null, 12).
+CONTEXT: COPY t_on_error_null, line 1: "11 a 12"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+select * from t_on_error_null;
+ a | c | b
+---+----+----
+ | 11 | 13
+ | 11 | 14
+(2 rows)
+
+drop table t_on_error_null;
+drop domain d_int_not_null;
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -789,6 +817,27 @@ CONTEXT: COPY check_ign_err, line 1: "1 {1}"
COPY check_ign_err FROM STDIN WITH (on_error ignore);
ERROR: extra data after last expected column
CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..1144822768 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,14 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
@@ -81,6 +83,7 @@ COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
COPY x to stdin (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x to stdout (log_verbosity unsupported);
-- too many columns in column list: should fail
@@ -526,6 +529,24 @@ a {2} 2
8 {8} 8
\.
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+a 11 14
+\.
+
+select * from t_on_error_null;
+drop table t_on_error_null;
+drop domain d_int_not_null;
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -557,6 +578,29 @@ COPY check_ign_err FROM STDIN WITH (on_error ignore);
1 {1} 3 abc
\.
+
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
base-commit: 00c76cf21c42c17e60e73a87dea0d1b4e234d9da
--
2.34.1
On 12.09.24 12:13, jian he wrote:
please check the attached file.
v4 applies cleanly, it works as expected, and all tests pass.
postgres=# \pset null '(NULL)'
Null display is "(NULL)".
postgres=# CREATE TEMPORARY TABLE t2 (a int, b int);
CREATE TABLE
postgres=# COPY t2 (a,b) FROM STDIN WITH (on_error set_to_null, format csv);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1,a
2,1
3,2
4,b
a,c
\.
COPY 5
postgres=# SELECT * FROM t2;
a | b
--------+--------
1 | (NULL)
2 | 1
3 | 2
4 | (NULL)
(NULL) | (NULL)
(5 rows)
Perhaps small changes in the docs:
<literal>set_to_null</literal> means the input value will set to
<literal>null</literal> and continue with the next one.
"will set" -> "will be set"
"and continue with" -> "and will continue with"
Other than that, LGTM.
Thanks!
--
Jim
Hi!
On Thu, 12 Sept 2024 at 15:13, jian he <jian.universality@gmail.com> wrote:
On Mon, Sep 9, 2024 at 10:34 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
Hi there
On 26.08.24 02:00, jian he wrote:
hi all.
patch updated.
simplified the code a lot.idea is same:
COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);If the STDIN number of columns is the same as the target table, then
InputFunctionCallSafe
call failure will make that column values be null.If the STDIN number of columns is not the same as the target table, then error
ERROR: missing data for column \"%s\"
ERROR: extra data after last expected column
which is status quo.I wanted to give it another try, but the patch does not apply ...
here we are.
please check the attached file.
Hi!
v4 no longer applies. It now conflicts with
e7834a1a251d4a28245377f383ff20a657ba8262.
Also, there were review comments.
So, I decided to rebase.
Review comments from [1]/messages/by-id/b26e9c6c-75bf-45ea-8aea-346dda3bd445@uni-muenster.de -- Best regards, Kirill Reshke applied partially. I didn't do "and continue
with" -> "and will continue with" substitution as suggested, because
the first options are used for `ignore` doc one lines above. So, I
just don't know how to change this correctly. We definitely don't want
two separate forms of saying the same in 2 consecutive lines.
I did small changes:
1) added
`-- tests for set_to_null`
option in the test script akin to 4ac2a9beceb10d44806d2cf157d5a931bdade39e
2) I rephrased
Allow "stop", or "ignore", "set_to_null" values
to
Allow "stop", "ignore", "set_to_null" values
PFA.
[1]: /messages/by-id/b26e9c6c-75bf-45ea-8aea-346dda3bd445@uni-muenster.de -- Best regards, Kirill Reshke
--
Best regards,
Kirill Reshke
Attachments:
v5-0001-on_error-set_to_null.patchapplication/octet-stream; name=v5-0001-on_error-set_to_null.patchDownload
From 44406d426f421508729b08b8014eccbd6ec3e6c9 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 12 Sep 2024 17:07:02 +0800
Subject: [PATCH v5] on_error set_to_null
extent "on_error action", introduce new option: on_error set_to_null.
Due to current grammar, we cannot use "on_error null",
so I choose on_error set_to_null.
any data type conversion errors while the COPY FROM process will set that column value to be NULL.
this will only work with COPY FROM and non-binary format.
However this will respect the not-null constraint, meaning, if you actually converted error to null,
but the column has not-null constraint, not-null constraint violation ERROR will be raised.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 1 +
src/backend/commands/copy.c | 4 ++-
src/backend/commands/copyfrom.c | 9 ++---
src/backend/commands/copyfromparse.c | 11 ++++++
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 50 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 44 ++++++++++++++++++++++++
7 files changed, 115 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..7f2b0e40662 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -396,6 +396,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
<literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>set_to_null</literal> means the input value will be set to <literal>null</literal> and continue with the next one.
The default is <literal>stop</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..605c7c60f19 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..dde30989009 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1321,7 +1321,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
@@ -1474,10 +1474,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b14..8ab42dbceee 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -961,6 +961,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ values[m] = (Datum) 0;
+ nulls[m] = true;
+ /*
+ * set error_occurred to false, so next
+ * InputFunctionCallSafe call behave sane.
+ */
+ cstate->escontext->error_occurred = false;
+ continue;
+ }
cstate->num_errors++;
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..051ca12d107 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..55ece18f9bf 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,8 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +130,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +779,24 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: null value in column "c" of relation "t_on_error_null" violates not-null constraint
+DETAIL: Failing row contains (11, null, 12).
+CONTEXT: COPY t_on_error_null, line 1: "11 a 12"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+select * from t_on_error_null;
+ a | c | b
+---+----+----
+ | 11 | 13
+ | 11 | 14
+(2 rows)
+
+drop table t_on_error_null;
+drop domain d_int_not_null;
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -813,6 +841,28 @@ ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT: COPY check_ign_err, line 5, column n: ""
COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
NOTICE: 4 rows were skipped due to data type incompatibility
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..8d974f4d9a9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,14 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +89,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +537,24 @@ a {2} 2
8 {8} 8
\.
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+a 11 14
+\.
+
+select * from t_on_error_null;
+drop table t_on_error_null;
+drop domain d_int_not_null;
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -588,6 +609,29 @@ a {7} 7
10 {10} 10
\.
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.34.1
On 2024/10/21 18:30, Kirill Reshke wrote:
v4 no longer applies. It now conflicts with
e7834a1a251d4a28245377f383ff20a657ba8262.
Also, there were review comments.So, I decided to rebase.
Thanks for the patch! Here are my review comments:
I noticed that on_error=set_to_null does not trigger NOTICE messages for rows
and columns with errors. It's "unexpected" thing for columns to be silently
replaced with NULL due to on_error=set_to_null. So, similar to on_error=ignore,
there should be NOTICE messages indicating which input records had columns
set to NULL because of data type incompatibility. Without these messages,
users might not realize that some columns were set to NULL.
How should on_error=set_to_null behave when reject_limit is set? It seems
intuitive to trigger an error if the number of rows with columns' data type
issues exceeds reject_limit, similar to on_error=ignore. This is open to discussion.
psql's tab-completion should be updated to include SET_TO_NULL.
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
<literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>set_to_null</literal> means the input value will be set to <literal>null</literal> and continue with the next one.
How about merging these two descriptions to one and updating it to the following?
-------------------
An error_action value of stop means fail the command, ignore means discard the input
row and continue with the next one, and set_to_null means replace columns with invalid
input values with NULL and move to the next row.
-------------------
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
This should be "... ignore and set_to_null options are ..."?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi!
On Mon, 21 Oct 2024 at 17:39, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2024/10/21 18:30, Kirill Reshke wrote:
v4 no longer applies. It now conflicts with
e7834a1a251d4a28245377f383ff20a657ba8262.
Also, there were review comments.So, I decided to rebase.
Thanks for the patch! Here are my review comments:
I noticed that on_error=set_to_null does not trigger NOTICE messages for rows
and columns with errors. It's "unexpected" thing for columns to be silently
replaced with NULL due to on_error=set_to_null. So, similar to on_error=ignore,
there should be NOTICE messages indicating which input records had columns
set to NULL because of data type incompatibility. Without these messages,
users might not realize that some columns were set to NULL.
Nice catch. That is a feature introduced by
f5a227895e178bf528b18f82bbe554435fb3e64f.
How should on_error=set_to_null behave when reject_limit is set? It seems
intuitive to trigger an error if the number of rows with columns' data type
issues exceeds reject_limit, similar to on_error=ignore. This is open to discussion.
Ok, let's discuss. My first suggestion was:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf. But our REJECT LIMIT is zero
(not set).
So, we ignore zero REJECT LIMIT if set_to_null is set.
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]https://github.com/postgresql-cfbot/postgresql/compare/cf/4810~1...cf/4810#diff-98d8bfd706468f77f8b0d5d0797e3dba3ffaaa88438143ef4cf7fedecaa56827R964 -- Best regards, Kirill Reshke) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.
psql's tab-completion should be updated to include SET_TO_NULL.
Agreed.
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
<literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>set_to_null</literal> means the input value will be set to <literal>null</literal> and continue with the next one.How about merging these two descriptions to one and updating it to the following?
-------------------
An error_action value of stop means fail the command, ignore means discard the input
row and continue with the next one, and set_to_null means replace columns with invalid
input values with NULL and move to the next row.
-------------------
Hm, good catch. Applied almost as you suggested. I did tweak this
"replace columns with invalid input values with " into "replace
columns containing erroneous input values with". Is that OK?
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
This should be "... ignore and set_to_null options are ..."?
Sure!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Here I sent v6 where review comments, except set_to_null vs reject
limit, were addressed. No new tests added yet, as some details are
unclear...
[1]: https://github.com/postgresql-cfbot/postgresql/compare/cf/4810~1...cf/4810#diff-98d8bfd706468f77f8b0d5d0797e3dba3ffaaa88438143ef4cf7fedecaa56827R964 -- Best regards, Kirill Reshke
--
Best regards,
Kirill Reshke
Attachments:
v6-0001-on_error-set_to_null.patchapplication/octet-stream; name=v6-0001-on_error-set_to_null.patchDownload
From 85a62beb248cf053dd0e05b01df7572dff7766db Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 12 Sep 2024 17:07:02 +0800
Subject: [PATCH v6] on_error set_to_null
extent "on_error action", introduce new option: on_error set_to_null.
Due to current grammar, we cannot use "on_error null",
so I choose on_error set_to_null.
any data type conversion errors while the COPY FROM process will set that column value to be NULL.
this will only work with COPY FROM and non-binary format.
However this will respect the not-null constraint, meaning, if you actually converted error to null,
but the column has not-null constraint, not-null constraint violation ERROR will be raised.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 7 ++--
src/backend/commands/copy.c | 12 ++++---
src/backend/commands/copyfrom.c | 9 ++---
src/backend/commands/copyfromparse.c | 11 ++++++
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 50 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 44 ++++++++++++++++++++++++
8 files changed, 123 insertions(+), 13 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..dcbfa17a3ce 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,12 +394,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next row.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..304022cd867 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -904,13 +906,13 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && !(opts_out->on_error == COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE or SET_TO_NULL */
+ errmsg("COPY %s requires %s to be set to %s or %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE", "SET_TO_NULL")));
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..dde30989009 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1321,7 +1321,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
@@ -1474,10 +1474,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b14..8ab42dbceee 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -961,6 +961,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ values[m] = (Datum) 0;
+ nulls[m] = true;
+ /*
+ * set error_occurred to false, so next
+ * InputFunctionCallSafe call behave sane.
+ */
+ cstate->escontext->error_occurred = false;
+ continue;
+ }
cstate->num_errors++;
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af73..33652b79f62 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..051ca12d107 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..55ece18f9bf 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,8 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +130,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +779,24 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: null value in column "c" of relation "t_on_error_null" violates not-null constraint
+DETAIL: Failing row contains (11, null, 12).
+CONTEXT: COPY t_on_error_null, line 1: "11 a 12"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+select * from t_on_error_null;
+ a | c | b
+---+----+----
+ | 11 | 13
+ | 11 | 14
+(2 rows)
+
+drop table t_on_error_null;
+drop domain d_int_not_null;
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -813,6 +841,28 @@ ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT: COPY check_ign_err, line 5, column n: ""
COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
NOTICE: 4 rows were skipped due to data type incompatibility
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..8d974f4d9a9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,14 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +89,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +537,24 @@ a {2} 2
8 {8} 8
\.
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+a 11 14
+\.
+
+select * from t_on_error_null;
+drop table t_on_error_null;
+drop domain d_int_not_null;
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -588,6 +609,29 @@ a {7} 7
10 {10} 10
\.
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.34.1
On Mon, Oct 21, 2024 at 8:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2024/10/21 18:30, Kirill Reshke wrote:
v4 no longer applies. It now conflicts with
e7834a1a251d4a28245377f383ff20a657ba8262.
Also, there were review comments.So, I decided to rebase.
Thanks for the patch! Here are my review comments:
I noticed that on_error=set_to_null does not trigger NOTICE messages for rows
and columns with errors. It's "unexpected" thing for columns to be silently
replaced with NULL due to on_error=set_to_null. So, similar to on_error=ignore,
there should be NOTICE messages indicating which input records had columns
set to NULL because of data type incompatibility. Without these messages,
users might not realize that some columns were set to NULL.
on_error=set_to_null,
we have two options for CopyFromStateData->num_errors.
A. Counting the number of rows that on_error set_to_null happened.
B. Counting number of times that on_error set_to_null happened
let's say optionA:
ereport(NOTICE,
errmsg_plural("%llu row was converted to NULL due to
data type incompatibility",
"%llu rows were converted to NULL due to
data type incompatibility",
(unsigned long long) cstate->num_errors,
(unsigned long long) cstate->num_errors));
I doubt the above message is accurate.
"%llu row was converted to NULL"
can mean "%llu row, for each row, all columns was converted to NULL"
but here we are
"%llu row, for each row, some column (can be all columns) was converted to NULL"
optionB: the message can be:
errmsg_plural("converted to NULL due to data type incompatibility
happened %llu time")
but I aslo feel the wording is not perfect also.
So overall I am not sure how to construct the NOTICE messages.
On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.
+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.
+1
But our REJECT LIMIT is zero
(not set).
So, we ignore zero REJECT LIMIT if set_to_null is set.
REJECT_LIMIT currently has to be greater than zero, so it won’t ever be zero.
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.
Your concern is valid. Allowing NULL to be stored in a column with a NOT NULL
constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you suggested,
NULL values set by SET_TO_NULL should probably be re-evaluated.
Hm, good catch. Applied almost as you suggested. I did tweak this
"replace columns with invalid input values with " into "replace
columns containing erroneous input values with". Is that OK?
Yes, sounds good.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.Your concern is valid. Allowing NULL to be stored in a column with a NOT NULL
constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you suggested,
NULL values set by SET_TO_NULL should probably be re-evaluated.
Thank you. I updated the patch with a NULL re-evaluation.
PFA v7. I did not yet update the doc for this patch version, waiting
for feedback about REJECT LIMIT + SET_TO_NULL behaviour.
Best regards,
Kirill Reshke
Attachments:
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patchapplication/octet-stream; name=v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patchDownload
From 0e04606ed5c76f0fe079bcb157194ab06f7272aa Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 12 Sep 2024 17:07:02 +0800
Subject: [PATCH v7] Incrtoduce COPY option to replace columns containing
erroneous data with null
extent "on_error action", introduce new option: on_error set_to_null.
Due to current grammar, we cannot use "on_error null",
so I choose on_error set_to_null.
any data type conversion errors while the COPY FROM process will set that column value to be NULL.
this will only work with COPY FROM and non-binary format.
However this will respect the not-null constraint, meaning, if you actually converted error to null,
but the column has not-null constraint, not-null constraint violation ERROR will be raised.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 7 ++-
src/backend/commands/copy.c | 12 ++--
src/backend/commands/copyfrom.c | 40 ++++++++----
src/backend/commands/copyfromparse.c | 46 ++++++++++++++
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 91 +++++++++++++++++++++++++++-
src/test/regress/sql/copy2.sql | 78 ++++++++++++++++++++++++
8 files changed, 255 insertions(+), 22 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..dcbfa17a3ce 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,12 +394,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next row.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..304022cd867 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -904,13 +906,13 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && !(opts_out->on_error == COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE or SET_TO_NULL */
+ errmsg("COPY %s requires %s to be set to %s or %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE", "SET_TO_NULL")));
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..5fc9d83270b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1003,7 +1003,7 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ if ((cstate->opts.on_error == COPY_ON_ERROR_IGNORE || cstate->opts.on_error == COPY_ON_ERROR_NULL) &&
cstate->escontext->error_occurred)
{
/*
@@ -1018,12 +1018,29 @@ CopyFrom(CopyFromState cstate)
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
cstate->num_errors);
- if (cstate->opts.reject_limit > 0 && \
- cstate->num_errors > cstate->opts.reject_limit)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
- (long long) cstate->opts.reject_limit)));
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ {
+ if (cstate->opts.reject_limit > 0 && cstate->num_errors > cstate->opts.reject_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+ (long long) cstate->opts.reject_limit)));
+ }
+ else
+ {
+ /* Provide different error msg if reject_limit is zero */
+ if (cstate->opts.reject_limit == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("failed to replace column containing erroneous data with null",
+ (long long) cstate->opts.reject_limit),
+ errhint("Consider specifying the REJECT LIMIT option to skip erroneous rows.")));
+ else if (cstate->num_errors > cstate->opts.reject_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("encountered more than REJECT_LIMIT (%lld) rows with data type incompatibility",
+ (long long) cstate->opts.reject_limit)));
+ }
/* Repeat NextCopyFrom() until no soft error occurs */
continue;
@@ -1321,7 +1338,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if ((cstate->opts.on_error == COPY_ON_ERROR_IGNORE || cstate->opts.on_error == COPY_ON_ERROR_NULL) &&
cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
@@ -1474,10 +1491,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..1511b44f4ba 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -960,6 +960,52 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ /*
+ * We encountered an error while parsing one of attributes.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL && string != NULL)
+ {
+ /*
+ * Temporary unset error_occurred.
+ * If null substitution for this attribute will
+ * succeed, we do not count this row as erroneous
+ */
+ cstate->escontext->error_occurred = false;
+
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ /* If datatype if okay with NULL, replace
+ * with null
+ */
+ nulls[m] = true;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+
+ ereport(NOTICE,
+ errmsg("replaced row attribute \"%s\" with NULL due to data type incompatibility at line %llu.",
+ cstate->cur_attname, (unsigned long long) cstate->cur_lineno));
+ }
+ continue;
+ }
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+ ereport(NOTICE,
+ errmsg("failed to replace row attribute \"%s\" with NULL at line %llu.",
+ cstate->cur_attname, (unsigned long long)cstate->cur_lineno));
+ }
+ }
+
+ /*
+ * Update copy state counter for number of erroneous rows
+ * as we are going to return from function.
+ */
cstate->num_errors++;
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991d..c2902ffc339 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..051ca12d107 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..22605a0ae45 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,8 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,12 +130,16 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
^
COPY x from stdin with (reject_limit 1);
-ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE or SET_TO_NULL
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
-- too many columns in column list: should fail
@@ -769,6 +779,47 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-- check inserted content
+TABLE t_on_error_null;
+ a | b | c
+----+----+----
+ 11 | | 12
+ 1 | 11 |
+(2 rows)
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: failed to replace column containing erroneous data with null
+HINT: Consider specifying the REJECT LIMIT option to skip erroneous rows.
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: failed to replace column containing erroneous data with null
+HINT: Consider specifying the REJECT LIMIT option to skip erroneous rows.
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--ok. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+NOTICE: 2 rows were skipped due to data type incompatibility
+-- check inserted content
+TABLE t_on_error_null;
+ a | b | c
+----+----+----
+ 11 | | 12
+ 1 | 11 |
+ 1 | 11 | 14
+(3 rows)
+
+--fail. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+ERROR: encountered more than REJECT_LIMIT (2) rows with data type incompatibility
+CONTEXT: COPY t_on_error_null, line 3, column a: null input
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -776,6 +827,17 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+-- check null substitution massages.
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, reject_limit 2, log_verbosity verbose);
+NOTICE: replaced row attribute "k" with NULL due to data type incompatibility at line 1.
+CONTEXT: COPY check_ign_err2, line 1, column k: "foo"
+NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
+CONTEXT: COPY check_ign_err2
+NOTICE: failed to replace row attribute "l" with NULL at line 3.
+CONTEXT: COPY check_ign_err2, line 3, column l: "'foooooooooooooooo'"
+NOTICE: skipping row due to data type incompatibility at line 3 for column "l": "'foooooooooooooooo'"
+CONTEXT: COPY check_ign_err2
+NOTICE: 2 rows were skipped due to data type incompatibility
COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
@@ -791,8 +853,9 @@ SELECT * FROM check_ign_err2;
n | m | k | l
---+-----+---+-------
1 | {1} | 1 | 'foo'
+ 1 | {1} | | 'foo'
3 | {3} | 3 | 'bar'
-(2 rows)
+(3 rows)
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
@@ -813,6 +876,28 @@ ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT: COPY check_ign_err, line 5, column n: ""
COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
NOTICE: 4 rows were skipped due to data type incompatibility
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -828,6 +913,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..f2527d3aeaa 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,14 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +89,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +537,50 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+1 11 d
+\.
+
+-- check inserted content
+TABLE t_on_error_null;
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--ok. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+-1 11 13
+a 11 14
+1 11 14
+\.
+
+-- check inserted content
+TABLE t_on_error_null;
+
+--fail. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+-1 11 13
+a 11 14
+\N 11 14
+\.
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -541,6 +588,12 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+-- check null substitution massages.
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, reject_limit 2, log_verbosity verbose);
+1 {1} foo 'foo'
+2 {2} 2 \N
+2 {2} 2 'foooooooooooooooo'
+\.
COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
3 {3} 3 'bar'
4 {4} 4 \N
@@ -588,6 +641,29 @@ a {7} 7
10 {10} 10
\.
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -603,6 +679,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?
IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.Your concern is valid. Allowing NULL to be stored in a column with a
NOT NULL
constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you
suggested,
NULL values set by SET_TO_NULL should probably be re-evaluated.Thank you. I updated the patch with a NULL re-evaluation.
PFA v7. I did not yet update the doc for this patch version, waiting
for feedback about REJECT LIMIT + SET_TO_NULL behaviour.
There were warnings when I applied the patch:
$ git apply
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170:
trailing whitespace.
/*
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181:
trailing whitespace.
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189:
trailing whitespace.
/* If datatype if okay with NULL,
replace
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196:
trailing whitespace.
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212:
trailing whitespace.
/*
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState
*pstate, bool is_from)
parser_errposition(pstate, def->location)));
...
- if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && !(opts_out->on_error == COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
Hmm, is this change necessary?
Personally, I feel the previous code is easier to read.
"REJECT LIMIT" should be "REJECT_LIMIT"?
1037 errhint("Consider specifying the
REJECT LIMIT option to skip erroneous rows.")));
SET_TO_NULL is one of the value for ON_ERROR, but the patch treats
SET_TO_NULL as option for COPY:
221 --- a/src/bin/psql/tab-complete.in.c
222 +++ b/src/bin/psql/tab-complete.in.c
223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
224 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
225 "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
226 "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
"DEFAULT",
227 - "ON_ERROR", "LOG_VERBOSITY");
228 + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
Best regards,
Kirill Reshke
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
Thanks for reviewing the v7 patch series!
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.
I do like the first version of interpretation, but I have a struggle
with it. According to this interpretation, we will fail COPY command
if the number
of malformed data rows exceeds the limit, not the number of rejected
rows (some percentage of malformed rows are accepted with null
substitution)
So, a proper name for the limit will be MALFORMED_LIMIT, or something.
However, we are unable to change the name since the REJECT_LIMIT
option has already been committed.
I guess I'll just have to put up with this contradiction. I will send
an updated patch shortly...
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.Your concern is valid. Allowing NULL to be stored in a column with a
NOT NULL
constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you
suggested,
NULL values set by SET_TO_NULL should probably be re-evaluated.Thank you. I updated the patch with a NULL re-evaluation.
PFA v7. I did not yet update the doc for this patch version, waiting
for feedback about REJECT LIMIT + SET_TO_NULL behaviour.There were warnings when I applied the patch:
$ git apply
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170:
trailing whitespace.
/*
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181:
trailing whitespace.v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189:
trailing whitespace.
/* If datatype if okay with NULL,
replace
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196:
trailing whitespace.v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212:
trailing whitespace.
/*@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState
*pstate, bool is_from)
parser_errposition(pstate, def->location)));...
- if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && !(opts_out->on_error == COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))Hmm, is this change necessary?
Personally, I feel the previous code is easier to read."REJECT LIMIT" should be "REJECT_LIMIT"?
1037 errhint("Consider specifying the
REJECT LIMIT option to skip erroneous rows.")));SET_TO_NULL is one of the value for ON_ERROR, but the patch treats
SET_TO_NULL as option for COPY:221 --- a/src/bin/psql/tab-complete.in.c
222 +++ b/src/bin/psql/tab-complete.in.c
223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
224 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
225 "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
226 "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
"DEFAULT",
227 - "ON_ERROR", "LOG_VERBOSITY");
228 + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");Best regards,
Kirill Reshke--
Regards,--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
--
Best regards,
Kirill Reshke
On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <reshkekirill@gmail.com> wrote:
On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
Thanks for reviewing the v7 patch series!
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.
I also prefer the previous version.
I do like the first version of interpretation, but I have a struggle
with it. According to this interpretation, we will fail COPY command
if the number
of malformed data rows exceeds the limit, not the number of rejected
rows (some percentage of malformed rows are accepted with null
substitution)
So, a proper name for the limit will be MALFORMED_LIMIT, or something.
However, we are unable to change the name since the REJECT_LIMIT
option has already been committed.
I guess I'll just have to put up with this contradiction. I will send
an updated patch shortly...
I think we can rename the REJECT_LIMIT option because it is not yet released.
The documentation says that REJECT_LIMIT "Specifies the maximum number of errors",
and there are no wording "reject" in the description, so I wonder it is unclear
what means in "REJECT" in REJECT_LIMIT. It may be proper to use ERROR_LIMIT
since it is supposed to be used with ON_ERROR.
Alternatively, if we emphasize that errors are handled other than terminating
the command,perhaps MALFORMED_LIMIT as proposed above or TOLERANCE_LIMIT may be
good, for example.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
On Tue, 12 Nov 2024 14:03:50 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <reshkekirill@gmail.com> wrote:On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
Thanks for reviewing the v7 patch series!
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.I also prefer the previous version.
I do like the first version of interpretation, but I have a struggle
with it. According to this interpretation, we will fail COPY command
if the number
of malformed data rows exceeds the limit, not the number of rejected
rows (some percentage of malformed rows are accepted with null
substitution)
So, a proper name for the limit will be MALFORMED_LIMIT, or something.
However, we are unable to change the name since the REJECT_LIMIT
option has already been committed.
I guess I'll just have to put up with this contradiction. I will send
an updated patch shortly...I think we can rename the REJECT_LIMIT option because it is not yet released.
The documentation says that REJECT_LIMIT "Specifies the maximum number of errors",
and there are no wording "reject" in the description, so I wonder it is unclear
what means in "REJECT" in REJECT_LIMIT. It may be proper to use ERROR_LIMIT
since it is supposed to be used with ON_ERROR.Alternatively, if we emphasize that errors are handled other than terminating
the command,perhaps MALFORMED_LIMIT as proposed above or TOLERANCE_LIMIT may be
good, for example.
I might misunderstand the meaning of the name. If REJECT_LIMIT means "a limit on
the number of rows with any malformed value allowed before the COPY command is
rejected", we would not have to rename it.
--
Yugo Nagata <nagata@sraoss.co.jp>
On 2024-11-12 14:17, Yugo Nagata wrote:
On Tue, 12 Nov 2024 14:03:50 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <reshkekirill@gmail.com> wrote:On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
Thanks for reviewing the v7 patch series!
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.I also prefer the previous version.
I do like the first version of interpretation, but I have a struggle
with it. According to this interpretation, we will fail COPY command
if the number
of malformed data rows exceeds the limit, not the number of rejected
rows (some percentage of malformed rows are accepted with null
substitution)
I feel your concern is valid.
Currently 'reject' can occur only when converting a column's input value
to its data type, but if we introduce set_to_null option 'reject' also
occurs when inserting null, i.e. not null constraint.
So, a proper name for the limit will be MALFORMED_LIMIT, or something.
However, we are unable to change the name since the REJECT_LIMIT
option has already been committed.
I guess I'll just have to put up with this contradiction. I will send
an updated patch shortly...I think we can rename the REJECT_LIMIT option because it is not yet
released.
+1
The documentation says that REJECT_LIMIT "Specifies the maximum number
of errors",
and there are no wording "reject" in the description, so I wonder it
is unclear
what means in "REJECT" in REJECT_LIMIT. It may be proper to use
ERROR_LIMIT
since it is supposed to be used with ON_ERROR.Alternatively, if we emphasize that errors are handled other than
terminating
the command,perhaps MALFORMED_LIMIT as proposed above or
TOLERANCE_LIMIT may be
good, for example.I might misunderstand the meaning of the name. If REJECT_LIMIT means "a
limit on
the number of rows with any malformed value allowed before the COPY
command is
rejected", we would not have to rename it.
The meaning of REJECT_LIMIT is what you described, and I think Kirill
worries about cases when malformed rows are accepted(=not REJECTed) with
null substitution. REJECT_LIMIT counts this case as REJECTed.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On Tue, 12 Nov 2024 17:38:25 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-11-12 14:17, Yugo Nagata wrote:
On Tue, 12 Nov 2024 14:03:50 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <reshkekirill@gmail.com> wrote:On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
Thanks for reviewing the v7 patch series!
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.I also prefer the previous version.
I do like the first version of interpretation, but I have a struggle
with it. According to this interpretation, we will fail COPY command
if the number
of malformed data rows exceeds the limit, not the number of rejected
rows (some percentage of malformed rows are accepted with nulnot-null constraintl
substitution)I feel your concern is valid.
Currently 'reject' can occur only when converting a column's input value
to its data type, but if we introduce set_to_null option 'reject' also
occurs when inserting null, i.e. not null constraint.
I can suppose "reject" means failure of COPY command here, that is, a reject
of executing the command, not an error of data input. If so, we can interpret
REJECT_LIMIT as "the number of malformed rows allowed before the COPY command
is REJECTed" (not the number of rejected rows). In this case, I think we don't
have to rename the option name.
Of course, if there is more proper name that makes it easy for users to
understand the behaviour of the option, renaming should be nice.
The documentation says that REJECT_LIMIT "Specifies the maximum number
of errors",
and there are no wording "reject" in the description, so I wonder it
is unclear
what means in "REJECT" in REJECT_LIMIT. It may be proper to use
ERROR_LIMIT
since it is supposed to be used with ON_ERROR.Alternatively, if we emphasize that errors are handled other than
terminating
the command,perhaps MALFORMED_LIMIT as proposed above or
TOLERANCE_LIMIT may be
good, for example.I might misunderstand the meaning of the name. If REJECT_LIMIT means "a
limit on
the number of rows with any malformed value allowed before the COPY
command is
rejected", we would not have to rename it.The meaning of REJECT_LIMIT is what you described, and I think Kirill
worries about cases when malformed rows are accepted(=not REJECTed) with
null substitution. REJECT_LIMIT counts this case as REJECTed.
I am a bit confused. You mean "REJECT" is raising a soft error of data
input here instead of terminating COPY?
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On 2024-11-13 22:02, Yugo NAGATA wrote:
On Tue, 12 Nov 2024 17:38:25 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:On 2024-11-12 14:17, Yugo Nagata wrote:
On Tue, 12 Nov 2024 14:03:50 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <reshkekirill@gmail.com> wrote:On Mon, 11 Nov 2024 at 16:11, torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
Thanks for reviewing the v7 patch series!
On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:On 2024/10/26 6:03, Kirill Reshke wrote:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.+1
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf.+1
After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.I also prefer the previous version.
I do like the first version of interpretation, but I have a struggle
with it. According to this interpretation, we will fail COPY command
if the number
of malformed data rows exceeds the limit, not the number of rejected
rows (some percentage of malformed rows are accepted with nulnot-null constraintl
substitution)I feel your concern is valid.
Currently 'reject' can occur only when converting a column's input
value
to its data type, but if we introduce set_to_null option 'reject' also
occurs when inserting null, i.e. not null constraint.I can suppose "reject" means failure of COPY command here, that is, a
reject
of executing the command, not an error of data input. If so, we can
interpret
REJECT_LIMIT as "the number of malformed rows allowed before the COPY
command
is REJECTed" (not the number of rejected rows). In this case, I think
we don't
have to rename the option name.Of course, if there is more proper name that makes it easy for users to
understand the behaviour of the option, renaming should be nice.The documentation says that REJECT_LIMIT "Specifies the maximum number
of errors",
and there are no wording "reject" in the description, so I wonder it
is unclear
what means in "REJECT" in REJECT_LIMIT. It may be proper to use
ERROR_LIMIT
since it is supposed to be used with ON_ERROR.Alternatively, if we emphasize that errors are handled other than
terminating
the command,perhaps MALFORMED_LIMIT as proposed above or
TOLERANCE_LIMIT may be
good, for example.I might misunderstand the meaning of the name. If REJECT_LIMIT means "a
limit on
the number of rows with any malformed value allowed before the COPY
command is
rejected", we would not have to rename it.The meaning of REJECT_LIMIT is what you described, and I think Kirill
worries about cases when malformed rows are accepted(=not REJECTed)
with
null substitution. REJECT_LIMIT counts this case as REJECTed.I am a bit confused.
Me too:)
Let me explain my understanding.
I believe there are now two candidates that count as REJECT_LIMIT
number:
(1) error converting a column's input value into its data type(soft
error)
(2) NULL substitution failure(this comes from the proposed patch)
And I understood Kirill's idea to be the following:
1st idea: REJECT_LIMIT counts (1)
2nd idea: REJECT_LIMIT counts (2)
And I've agreed with the 1st one.
You mean "REJECT" is raising a soft error of data
input here instead of terminating COPY?
Yes.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On Sat, Nov 9, 2024 at 8:55 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.Your concern is valid. Allowing NULL to be stored in a column with a NOT NULL
constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you suggested,
NULL values set by SET_TO_NULL should probably be re-evaluated.Thank you. I updated the patch with a NULL re-evaluation.
take me sometime to understand your change with InputFunctionCallSafe.
it actually works fine with domain,
i think mainly because domain_in proisstrict is false and all other
type input function proisstrict is true!
--case1
create table t1(a dnn);
copy t1 from stdin(on_error set_to_null);
A
\.
--case2
create table t2(a int not null);
copy t2 from stdin(on_error set_to_null);
A
\.
I think it should be to either let domains with not-null behave the
same as column level not-null
or just insert NULL to a column with domain not-null constraint.
in doc[1], we already mentioned that a column with a not-null domain
is possible to have null value .
https://www.postgresql.org/docs/current/sql-createdomain.html
attached v8, based on your v7, main change it NextCopyFrom,
InputFunctionCallSafe.
The idea is when InputFunctionCallSafe fails, on_error set_to_null
needs to check if this is a type as not-null domain.
pass NULL string to InputFunctionCallSafe again to check if this type
allows null or not.
If not allow null then error out (ereport(ERROR)).
i think this will align with column level not-null constraint, what do
you guys think?
i am mainly change copyfromparse.c's for now.
other places no change, same as v7.
Attachments:
v8-0001-COPY-option-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v8-0001-COPY-option-on_error-set_to_null.patchDownload
From d0553ec7446f0f352a281c591db228029c0946c4 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 16 Nov 2024 15:57:59 +0800
Subject: [PATCH v8 1/1] COPY option on_error set_to_null
extent "on_error action", introduce new option: on_error set_to_null.
Due to current grammar, we cannot use "on_error null", so I choose on_error
set_to_null.
any data type conversion errors while the COPY FROM process will set that column
value to be NULL. this will only work with COPY FROM and non-binary format.
However this will respect the not-null constraint, meaning, if you actually
converted error to null, but the column has not-null constraint, not-null
constraint violation ERROR will be raised.
this also respect not-null constraint on domain, meaning on_error set_to_null
may raise ERROR for failed domain_in function call
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 7 +--
src/backend/commands/copy.c | 12 +++--
src/backend/commands/copyfrom.c | 39 ++++++++++----
src/backend/commands/copyfromparse.c | 40 ++++++++++++++
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 80 +++++++++++++++++++++++++++-
src/test/regress/sql/copy2.sql | 79 +++++++++++++++++++++++++++
8 files changed, 239 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..dcbfa17a3c 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,12 +394,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next row.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..304022cd86 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -904,13 +906,13 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && !(opts_out->on_error == COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE or SET_TO_NULL */
+ errmsg("COPY %s requires %s to be set to %s or %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE", "SET_TO_NULL")));
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..f00f383baa 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1003,7 +1003,7 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ if ((cstate->opts.on_error == COPY_ON_ERROR_IGNORE || cstate->opts.on_error == COPY_ON_ERROR_NULL) &&
cstate->escontext->error_occurred)
{
/*
@@ -1018,12 +1018,28 @@ CopyFrom(CopyFromState cstate)
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
cstate->num_errors);
- if (cstate->opts.reject_limit > 0 && \
- cstate->num_errors > cstate->opts.reject_limit)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
- (long long) cstate->opts.reject_limit)));
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ {
+ if (cstate->opts.reject_limit > 0 && cstate->num_errors > cstate->opts.reject_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+ (long long) cstate->opts.reject_limit)));
+ }
+ else
+ {
+ /* Provide different error msg if reject_limit is zero */
+ if (cstate->opts.reject_limit == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("failed to replace column containing erroneous data with null"),
+ errhint("Consider specifying the REJECT LIMIT option to skip erroneous rows.")));
+ else if (cstate->num_errors > cstate->opts.reject_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("encountered more than REJECT_LIMIT (%lld) rows with data type incompatibility",
+ (long long) cstate->opts.reject_limit)));
+ }
/* Repeat NextCopyFrom() until no soft error occurs */
continue;
@@ -1321,7 +1337,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if ((cstate->opts.on_error == COPY_ON_ERROR_IGNORE || cstate->opts.on_error == COPY_ON_ERROR_NULL) &&
cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
@@ -1474,10 +1490,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d8..142fdf3fbb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -960,6 +960,46 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ /*
+ * We encountered an error while parsing one of attributes.
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * Temporary unset error_occurred. for basetype, null value
+ * is allowed, for domain type, we may have not-null
+ * constraint. we pass NULL to InputFunctionCallSafe to check
+ * if this type have not-null constraint or not. If it's
+ * domain with not-null constraint, then we have to error
+ * out, that would behave consistent with column level
+ * not-null constraint
+ */
+ cstate->escontext->error_occurred = false;
+
+ if (!InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ ereport(ERROR,
+ (errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values",
+ format_type_be(typioparams[m])),
+ errdatatype(typioparams[m])));
+
+ /* If datatype if okay with NULL, replace
+ * with null
+ */
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ continue;
+ }
+
+ /*
+ * Update copy state counter for number of erroneous rows
+ * as we are going to return from function.
+ */
cstate->num_errors++;
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991..c2902ffc33 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..051ca12d10 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..62cbe0c2b3 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,8 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,12 +130,16 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
^
COPY x from stdin with (reject_limit 1);
-ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE or SET_TO_NULL
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
-- too many columns in column list: should fail
@@ -769,6 +779,46 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-- check inserted content
+TABLE t_on_error_null;
+ a | b | c
+----+----+----
+ 11 | | 12
+ 1 | 11 |
+(2 rows)
+
+--fail. we do check domain not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail. first check constraint fails, then we convert column a value to null
+-- by on_error set_to_nul but column a domain type not allow null value.
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--ok. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+-- check inserted content
+TABLE t_on_error_null;
+ a | b | c
+----+----+----
+ 11 | | 12
+ 1 | 11 |
+(2 rows)
+
+--fail. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -776,6 +826,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+-- check null substitution massages.
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, reject_limit 2, log_verbosity verbose);
+ERROR: domain dcheck_ign_err2 does not allow null values
+CONTEXT: COPY check_ign_err2, line 2, column l: null input
COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
@@ -813,6 +867,28 @@ ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT: COPY check_ign_err, line 5, column n: ""
COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
NOTICE: 4 rows were skipped due to data type incompatibility
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -828,6 +904,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..c9884cc169 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,14 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +89,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +537,51 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+1 11 d
+\.
+
+-- check inserted content
+TABLE t_on_error_null;
+
+--fail. we do check domain not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail. first check constraint fails, then we convert column a value to null
+-- by on_error set_to_nul but column a domain type not allow null value.
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--ok. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+-1 11 13
+a 11 14
+1 11 14
+\.
+
+-- check inserted content
+TABLE t_on_error_null;
+
+--fail. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+-1 11 13
+a 11 14
+\N 11 14
+\.
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -541,6 +589,12 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+-- check null substitution massages.
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, reject_limit 2, log_verbosity verbose);
+1 {1} foo 'foo'
+2 {2} 2 \N
+2 {2} 2 'foooooooooooooooo'
+\.
COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
3 {3} 3 'bar'
4 {4} 4 \N
@@ -588,6 +642,29 @@ a {7} 7
10 {10} 10
\.
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -603,6 +680,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Sat, 16 Nov 2024 at 13:27, jian he <jian.universality@gmail.com> wrote:
On Sat, Nov 9, 2024 at 8:55 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.Your concern is valid. Allowing NULL to be stored in a column with a NOT NULL
constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you suggested,
NULL values set by SET_TO_NULL should probably be re-evaluated.Thank you. I updated the patch with a NULL re-evaluation.
take me sometime to understand your change with InputFunctionCallSafe.
it actually works fine with domain,
i think mainly because domain_in proisstrict is false and all other
type input function proisstrict is true!--case1
create table t1(a dnn);
copy t1 from stdin(on_error set_to_null);
A
\.--case2
create table t2(a int not null);
copy t2 from stdin(on_error set_to_null);
A
\.I think it should be to either let domains with not-null behave the
same as column level not-null
or just insert NULL to a column with domain not-null constraint.in doc[1], we already mentioned that a column with a not-null domain
is possible to have null value .
https://www.postgresql.org/docs/current/sql-createdomain.htmlattached v8, based on your v7, main change it NextCopyFrom,
InputFunctionCallSafe.
The idea is when InputFunctionCallSafe fails, on_error set_to_null
needs to check if this is a type as not-null domain.
pass NULL string to InputFunctionCallSafe again to check if this type
allows null or not.
If not allow null then error out (ereport(ERROR)).
i think this will align with column level not-null constraint, what do
you guys think?i am mainly change copyfromparse.c's for now.
other places no change, same as v7.
Hello. I received your email just as I was ready to send my version
eight of this thread.
Your patch does not apply due to 9a70f67.
```
reshke@ygp-jammy:~/pg$ git apply v8-0001-COPY-option-on_error-set_to_null.patch
error: patch failed: src/backend/commands/copyfrom.c:1018
error: src/backend/commands/copyfrom.c: patch does not apply
```
1) Your v8 does not fix tab-complete issue mentioned by Atsushi
Torikoshi in [1]/messages/by-id/501dd655ddb04693c15baeb6485bc601@oss.nttdata.com.
2) Your version does not address discussion about SET_TO_NULL vs
REJECT_LIMIT (see [2]/messages/by-id/07587c36-18b3-4ccb-b5fb-579bcb04ed37@oss.nttdata.com & [3]/messages/by-id/1462d79784b2475f1c714c65a6f25652@oss.nttdata.com -- Best regards, Kirill Reshke). I am leaning towards option 1 from [3]/messages/by-id/1462d79784b2475f1c714c65a6f25652@oss.nttdata.com -- Best regards, Kirill Reshke,
and my v8 implements that.
```
reshke=# create domain dd as int not null;
CREATE DOMAIN
reshke=# create table tt(i dd);
CREATE TABLE
reshke=# copy tt from stdin with (on_error set_to_null, log_verbosity
verbose, reject_limit 2);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
s
1
\.
ERROR: domain dd does not allow null values
CONTEXT: COPY tt, line 1, column i: "s"
reshke=#
```
I expect no error here, as reject_limit is specified.
Regression test that checks for this in v7 were changed, in my
opinion, incorrectly.
I am attaching my v8 for reference.
[1]: /messages/by-id/501dd655ddb04693c15baeb6485bc601@oss.nttdata.com
[2]: /messages/by-id/07587c36-18b3-4ccb-b5fb-579bcb04ed37@oss.nttdata.com
[3]: /messages/by-id/1462d79784b2475f1c714c65a6f25652@oss.nttdata.com -- Best regards, Kirill Reshke
--
Best regards,
Kirill Reshke
Attachments:
v8-0001-Introduce-COPY-option-to-replace-columns-containi.patchapplication/octet-stream; name=v8-0001-Introduce-COPY-option-to-replace-columns-containi.patchDownload
From df39b38589fc8384e0590d76f8b0e2e4295f6f50 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 12 Sep 2024 17:07:02 +0800
Subject: [PATCH v8] Introduce COPY option to replace columns containing
erroneous data with null
extent "on_error action", introduce new option: on_error set_to_null.
Due to current grammar, we cannot use "on_error null",
so I choose on_error set_to_null.
any data type conversion errors while the COPY FROM process will set that column value to be NULL.
this will only work with COPY FROM and non-binary format.
However this will respect the not-null constraint, meaning, if you actually converted error to null,
but the column has not-null constraint, not-null constraint violation ERROR will be raised.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 9 +--
src/backend/commands/copy.c | 12 ++--
src/backend/commands/copyfrom.c | 53 +++++++++++---
src/backend/commands/copyfromparse.c | 69 +++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 100 ++++++++++++++++++++++++++-
src/test/regress/sql/copy2.sql | 82 ++++++++++++++++++++++
8 files changed, 303 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..bce10ea62e5 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,12 +394,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next row.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
@@ -422,7 +423,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies the maximum number of errors tolerated while converting a
column's input value to its data type, when <literal>ON_ERROR</literal> is
set to <literal>ignore</literal>.
- If the input causes more errors than the specified value, the <command>COPY</command>
+ If the input contains more erroneous rows than the specified value, the <command>COPY</command>
command fails, even with <literal>ON_ERROR</literal> set to <literal>ignore</literal>.
This clause must be used with <literal>ON_ERROR</literal>=<literal>ignore</literal>
and <replaceable class="parameter">maxerror</replaceable> must be positive <type>bigint</type>.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..304022cd867 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -904,13 +906,13 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && !(opts_out->on_error == COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE or SET_TO_NULL */
+ errmsg("COPY %s requires %s to be set to %s or %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE", "SET_TO_NULL")));
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 754cb496169..557b00266ff 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1003,7 +1003,7 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ if ((cstate->opts.on_error == COPY_ON_ERROR_IGNORE || cstate->opts.on_error == COPY_ON_ERROR_NULL) &&
cstate->escontext->error_occurred)
{
/*
@@ -1018,12 +1018,30 @@ CopyFrom(CopyFromState cstate)
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
cstate->num_errors);
- if (cstate->opts.reject_limit > 0 &&
- cstate->num_errors > cstate->opts.reject_limit)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
- (long long) cstate->opts.reject_limit)));
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ {
+ if (cstate->opts.reject_limit > 0 &&
+ cstate->num_errors > cstate->opts.reject_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+ (long long) cstate->opts.reject_limit)));
+ }
+ else
+ {
+ /* Provide different error msg if reject_limit is zero */
+ if (cstate->opts.reject_limit == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("failed to replace column containing erroneous data with null",
+ (long long) cstate->opts.reject_limit),
+ errhint("Consider specifying the REJECT_LIMIT option to skip erroneous rows.")));
+ else if (cstate->num_errors > cstate->opts.reject_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("encountered more than REJECT_LIMIT (%lld) rows with data type incompatibility",
+ (long long) cstate->opts.reject_limit)));
+ }
/* Repeat NextCopyFrom() until no soft error occurs */
continue;
@@ -1321,7 +1339,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
@@ -1330,6 +1348,18 @@ CopyFrom(CopyFromState cstate)
(unsigned long long) cstate->num_errors,
(unsigned long long) cstate->num_errors));
+ /* In case on_error SET_TO_NULL, if COPY succeed, it means that
+ * all erroneous rows attributes filled with NULL
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL &&
+ cstate->num_errors > 0 &&
+ cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+ ereport(NOTICE,
+ errmsg_plural("Erroneous values in %llu row were replaced with NULL",
+ "Erroneous values in %llu rows were replaced with NULL",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1474,10 +1504,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..943926ad1d9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft
- * errors
+ * errors. If ON_ERROR is specified with SET_TO_NULL, try
+ * to replace attribute value with NULL.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -960,9 +962,63 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
- cstate->num_errors++;
+ /*
+ * Regardless of NULL substrition success, we count
+ * current row as erroneous
+ */
+ current_row_erroneous = true;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ /*
+ * We encountered an error while parsing one of attributes.
+ *
+ */
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL && string != NULL)
+ {
+ /*
+ * Temporary unset error_occurred, for next InputFunctionCallSafe
+ * sanity. If null substitution for this attribute will
+ * succeed, we will accept this row.
+ */
+ cstate->escontext->error_occurred = false;
+
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ /* If datatype if okay with NULL, replace
+ * with null
+ */
+ nulls[m] = true;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("replaced row attribute \"%s\" with NULL due to data type incompatibility at line %llu.",
+ cstate->cur_attname, (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("failed to replace row attribute \"%s\" with NULL at line %llu.",
+ cstate->cur_attname, (unsigned long long)cstate->cur_lineno));
+ }
+
+ /*
+ * Here we end processing of current COPY row.
+ * Update copy state counter for number of erroneous rows.
+ */
+ cstate->num_errors++;
+ cstate->escontext->error_occurred = true;
+
+ /* Only print this NOTICE message, if it will not be followed by ERROR */
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ (
+ (cstate->opts.on_error == COPY_ON_ERROR_NULL && cstate->opts.reject_limit > 0 && cstate->num_errors <= cstate->opts.reject_limit) ||
+ (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && (cstate->opts.reject_limit == 0 || cstate->num_errors <= cstate->opts.reject_limit))
+ ))
{
/*
* Since we emit line number and column info in the below
@@ -1001,6 +1057,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ /*
+ * Update copy state counter for number of erroneous rows.
+ * But do not set error_occurred, since row was actually accepted
+ */
+ if (current_row_erroneous)
+ cstate->num_errors++;
+
Assert(fieldno == attr_count);
}
else
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index fad2277991d..c2902ffc339 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..051ca12d107 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..d9d64082478 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,8 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,12 +130,16 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
^
COPY x from stdin with (reject_limit 1);
-ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE or SET_TO_NULL
COPY x from stdin with (on_error ignore, reject_limit 0);
ERROR: REJECT_LIMIT (0) must be greater than zero
-- too many columns in column list: should fail
@@ -769,6 +779,50 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+NOTICE: Erroneous values in 1 row were replaced with NULL
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+NOTICE: Erroneous values in 1 row were replaced with NULL
+-- check inserted content
+TABLE t_on_error_null;
+ a | b | c
+----+------+------
+ 11 | NULL | 12
+ 1 | 11 | NULL
+(2 rows)
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: failed to replace column containing erroneous data with null
+HINT: Consider specifying the REJECT LIMIT option to skip erroneous rows.
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: failed to replace column containing erroneous data with null
+HINT: Consider specifying the REJECT LIMIT option to skip erroneous rows.
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--ok. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+NOTICE: Erroneous values in 2 rows were replaced with NULL
+-- check inserted content
+TABLE t_on_error_null;
+ a | b | c
+----+------+------
+ 11 | NULL | 12
+ 1 | 11 | NULL
+ 1 | 11 | 14
+(3 rows)
+
+--fail. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+ERROR: encountered more than REJECT_LIMIT (2) rows with data type incompatibility
+CONTEXT: COPY t_on_error_null, line 3, column a: null input
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -776,6 +830,26 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input
CONTEXT: COPY check_ign_err2
NOTICE: 1 row was skipped due to data type incompatibility
+-- check null substitution massages.
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE: failed to replace row attribute "l" with NULL at line 1.
+CONTEXT: COPY check_ign_err2, line 1, column l: "'foooooooooooooooo'"
+ERROR: failed to replace column containing erroneous data with null
+HINT: Consider specifying the REJECT LIMIT option to skip erroneous rows.
+CONTEXT: COPY check_ign_err2, line 1, column l: "'foooooooooooooooo'"
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, reject_limit 2, log_verbosity verbose);
+NOTICE: failed to replace row attribute "l" with NULL at line 1.
+CONTEXT: COPY check_ign_err2, line 1, column l: "'foooooooooooooooo'"
+NOTICE: skipping row due to data type incompatibility at line 1 for column "l": "'foooooooooooooooo'"
+CONTEXT: COPY check_ign_err2
+NOTICE: failed to replace row attribute "l" with NULL at line 2.
+CONTEXT: COPY check_ign_err2, line 2, column l: "'foooooooooooooooo'"
+NOTICE: skipping row due to data type incompatibility at line 2 for column "l": "'foooooooooooooooo'"
+CONTEXT: COPY check_ign_err2
+NOTICE: failed to replace row attribute "l" with NULL at line 3.
+CONTEXT: COPY check_ign_err2, line 3, column l: "'foooooooooooooooo'"
+ERROR: encountered more than REJECT_LIMIT (2) rows with data type incompatibility
+CONTEXT: COPY check_ign_err2, line 3, column l: "'foooooooooooooooo'"
COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
-- reset context choice
\set SHOW_CONTEXT errors
@@ -813,6 +887,28 @@ ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT: COPY check_ign_err, line 5, column n: ""
COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
NOTICE: 4 rows were skipped due to data type incompatibility
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+NOTICE: Erroneous values in 3 rows were replaced with NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -828,6 +924,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d1dd61b4bf2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,14 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +89,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +537,52 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+1 11 d
+\.
+
+-- check inserted content
+TABLE t_on_error_null;
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--ok. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+-1 11 13
+a 11 14
+1 11 14
+\.
+
+-- check inserted content
+TABLE t_on_error_null;
+
+--fail. Check interaction with REJECT LIMIT
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, reject_limit 2);
+-1 11 13
+a 11 14
+\N 11 14
+\.
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -541,6 +590,15 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1 {1} 1 'foo'
2 {2} 2 \N
\.
+-- check null substitution massages.
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+2 {2} 2 'foooooooooooooooo'
+\.
+COPY check_ign_err2 FROM STDIN WITH (on_error set_to_null, reject_limit 2, log_verbosity verbose);
+2 {2} 2 'foooooooooooooooo'
+2 {2} 2 'foooooooooooooooo'
+2 {2} 2 'foooooooooooooooo'
+\.
COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
3 {3} 3 'bar'
4 {4} 4 \N
@@ -588,6 +646,28 @@ a {7} 7
10 {10} 10
\.
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -603,6 +683,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Sat, Nov 16, 2024 at 5:55 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
I am attaching my v8 for reference.
in your v8.
<varlistentry>
<term><literal>REJECT_LIMIT</literal></term>
<listitem>
<para>
Specifies the maximum number of errors tolerated while converting a
column's input value to its data type, when <literal>ON_ERROR</literal> is
set to <literal>ignore</literal>.
If the input contains more erroneous rows than the specified
value, the <command>COPY</command>
command fails, even with <literal>ON_ERROR</literal> set to
<literal>ignore</literal>.
</para>
</listitem>
</varlistentry>
then above description not meet with following example, (i think)
create table t(a int not null);
COPY t FROM STDIN WITH (on_error set_to_null, reject_limit 2);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
a
\.
ERROR: null value in column "a" of relation "t" violates not-null constraint
DETAIL: Failing row contains (null).
CONTEXT: COPY t, line 1, column a: "a"
Overall, I think
making the domain not-null align with column level not-null would be a
good thing.
<para>
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command,
<literal>ignore</literal> means discard the input row and
continue with the next one, and
<literal>set_to_null</literal> means replace columns containing
erroneous input values with <literal>null</literal> and move to the
next row.
"and move to the next row" is wrong?
I think it should be " and move to the next field".
On Tue, 19 Nov 2024, 13:52 jian he, <jian.universality@gmail.com> wrote:
On Sat, Nov 16, 2024 at 5:55 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
I am attaching my v8 for reference.
in your v8.
<varlistentry>
<term><literal>REJECT_LIMIT</literal></term>
<listitem>
<para>
Specifies the maximum number of errors tolerated while converting a
column's input value to its data type, when <literal>ON_ERROR</literal> is
set to <literal>ignore</literal>.
If the input contains more erroneous rows than the specified
value, the <command>COPY</command>
command fails, even with <literal>ON_ERROR</literal> set to
<literal>ignore</literal>.
</para>
</listitem>
</varlistentry>then above description not meet with following example, (i think)
create table t(a int not null);
COPY t FROM STDIN WITH (on_error set_to_null, reject_limit 2);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a
\.ERROR: null value in column "a" of relation "t" violates not-null constraint
DETAIL: Failing row contains (null).
CONTEXT: COPY t, line 1, column a: "a"
Sure, my v8 does not helps with column level NOT NULL constraint (or
other constraint)
Overall, I think
making the domain not-null align with column level not-null would be a
good thing.
While this looks sane, it's actually a separate topic. Even on current
HEAD we have domain not-null vs column level not-null unalignment.
consider this example
```
reshke=# create table ftt2 (i int not null);
CREATE TABLE
reshke=# copy ftt2 from stdin with (reject_limit 1000, on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
\N
\.
ERROR: null value in column "i" of relation "ftt2" violates not-null constraint
DETAIL: Failing row contains (null).
CONTEXT: COPY ftt2, line 1: "\N"
reshke=# create domain dd as int not null ;
CREATE DOMAIN
reshke=# create table ftt3(i dd);
CREATE TABLE
reshke=# copy ftt3 from stdin with (reject_limit 1000, on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
\N
\.
NOTICE: 1 row was skipped due to data type incompatibility
COPY 0
reshke=#
```
So, if we want this, we need to start another thread and deal with
REJECT_LIMIT + on_error ignore first.
The ExecConstraints function is the source of the error in scenario 1.
Therefore, we require something like "ExecConstraintsSafe" to
accommodate the aforementioned. That is a significant change. Not sure
it will be accepted by the community.
<para>
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command,
<literal>ignore</literal> means discard the input row and
continue with the next one, and
<literal>set_to_null</literal> means replace columns containing
erroneous input values with <literal>null</literal> and move to the
next row."and move to the next row" is wrong?
I think it should be " and move to the next field".
Yes, "and move to the next field" is correct.
On Tue, 19 Nov 2024 at 13:52, jian he <jian.universality@gmail.com> wrote:
On Sat, Nov 16, 2024 at 5:55 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
I am attaching my v8 for reference.
in your v8.
<varlistentry>
<term><literal>REJECT_LIMIT</literal></term>
<listitem>
<para>
Specifies the maximum number of errors tolerated while converting a
column's input value to its data type, when <literal>ON_ERROR</literal> is
set to <literal>ignore</literal>.
If the input contains more erroneous rows than the specified
value, the <command>COPY</command>
command fails, even with <literal>ON_ERROR</literal> set to
<literal>ignore</literal>.
</para>
</listitem>
</varlistentry>then above description not meet with following example, (i think)
create table t(a int not null);
COPY t FROM STDIN WITH (on_error set_to_null, reject_limit 2);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a
\.ERROR: null value in column "a" of relation "t" violates not-null constraint
DETAIL: Failing row contains (null).
CONTEXT: COPY t, line 1, column a: "a"Overall, I think
making the domain not-null align with column level not-null would be a
good thing.<para>
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command,
<literal>ignore</literal> means discard the input row and
continue with the next one, and
<literal>set_to_null</literal> means replace columns containing
erroneous input values with <literal>null</literal> and move to the
next row."and move to the next row" is wrong?
I think it should be " and move to the next field".
Hi! There is not too much time left in this CF, so I moved to the next one.
If you are going to work on this patch, I'm waiting on your feedback
or a v9 patch that answers the issues brought up in this discussion.
--
Best regards,
Kirill Reshke
+ /*
+ * Here we end processing of current COPY row.
+ * Update copy state counter for number of erroneous rows.
+ */
+ cstate->num_errors++;
+ cstate->escontext->error_occurred = true;
+
+ /* Only print this NOTICE message, if it will not be followed by ERROR */
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ (
+ (cstate->opts.on_error == COPY_ON_ERROR_NULL &&
cstate->opts.reject_limit > 0 && cstate->num_errors <=
cstate->opts.reject_limit) ||
+ (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
(cstate->opts.reject_limit == 0 || cstate->num_errors <=
cstate->opts.reject_limit))
+ ))
{
this is kind of hard to comprehend.
so attached is a simple version of it based on v8.
for copy (on_error set_to_null)
1. not allow specifying reject_limit option
2. ereport ERROR for not-null constraint violation for domain type.
for example:
CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
CREATE TABLE t1 (a d_int_not_null);
COPY t1 FROM STDIN WITH (on_error set_to_null);
these 3 values: \N a -1
will error out, the error message will be:
ERROR: domain d_int_not_null does not allow null values
Attachments:
v9-0001-new-COPY-on_error-option-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v9-0001-new-COPY-on_error-option-set_to_null.patchDownload
From 47bdeda8911596950463e70e33253e773ef13192 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 13 Dec 2024 11:03:16 +0800
Subject: [PATCH v9 1/1] new COPY on_error option: set_to_null
extent "on_error action", introduce new option: on_error set_to_null.
due to current grammar, we cannot use "on_error null", so i choose on_error set_to_null.
any data type conversion errors while the COPY FROM process will set that column value to be NULL.
this will only work with COPY FROM with non-binary format.
However this will respect the not-null constraint, meaning, if you actually
converted error to null, but the column has not-null constraint, not-null
constraint violation ERROR will be reported.
regress test contains corner case for not-null with domain constraint.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 26 ++++++++----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 31 +++++++++------
src/backend/commands/copyfromparse.c | 47 +++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 4 +-
src/test/regress/expected/copy2.out | 50 ++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 45 +++++++++++++++++++++
9 files changed, 188 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..4b847c25f3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,21 +394,33 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
+ For <literal>ignore</literal> option,
+ a <literal>NOTICE</literal> message containing the ignored row count is
emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null happened is
+ emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
+ <para>
+ When <literal>LOG_VERBOSITY</literal> option is set to
+ <literal>verbose</literal>, for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message
+ containing the line of the input file and the column name whose input value has been replaced by null
+ is emitted for each row where input conversion has failed ;
When it is set to <literal>silent</literal>, no message is emitted
regarding ignored rows.
</para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2d98ecf3f4..f494d2d64c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 4d52c93c30..a9e61f5e26 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1003,7 +1003,7 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ if ((cstate->opts.on_error == COPY_ON_ERROR_IGNORE || cstate->opts.on_error == COPY_ON_ERROR_NULL) &&
cstate->escontext->error_occurred)
{
/*
@@ -1321,14 +1321,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%llu row was skipped due to data type incompatibility",
- "%llu rows were skipped due to data type incompatibility",
- (unsigned long long) cstate->num_errors,
- (unsigned long long) cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("Erroneous values in %llu row was replaced with NULL",
+ "Erroneous values in %llu rows were replaced with NULL",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1474,10 +1482,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d8..7f98f66ebe 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft
- * errors
+ * errors. If ON_ERROR is specified with SET_TO_NULL, try
+ * to replace attribute value with NULL.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -960,9 +962,48 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ /*
+ * we use it to count the number of rows (not fields)
+ * successfully doing on_error set_to_null.
+ */
+ current_row_erroneous = true;
+
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we need another InputFunctionCallSafe for domain with a
+ * not-null constraint. if domain don't have not-null
+ * then continue to the next field, otherwise error out.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("replaced row attribute \"%s\" with NULL due to data type incompatibility at line %llu",
+ cstate->cur_attname,
+ (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
+
cstate->num_errors++;
+ cstate->escontext->error_occurred = true;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error != COPY_ON_ERROR_NULL)
{
/*
* Since we emit line number and column info in the below
@@ -1001,6 +1042,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
}
else
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index bbd08770c3..fc2260b541 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..051ca12d10 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..5eaacd8c6b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -98,7 +98,9 @@ typedef struct CopyFromStateData
ErrorSaveContext *escontext; /* soft error trapper during in_functions
* execution */
uint64 num_errors; /* total number of rows which contained soft
- * errors */
+ * errors, for ON_ERROR set_to_null, it's the
+ * number of rows successfully converted to null
+ */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..f03dd24d54 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,12 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
+COPY x FROM stdin WITH (on_error set_to_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +98,8 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,42 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, colum a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, colum a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, colum a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+NOTICE: Erroneous values in 3 rows were replaced with NULL
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +876,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..2bbb1ecce1 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+COPY x FROM stdin WITH (on_error set_to_null, reject_limit 2);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, colum a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N 11 13
+\.
+
+--fail, colum a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail, colum a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,8 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
Thank you for your update.
On Fri, 13 Dec 2024 at 08:15, jian he <jian.universality@gmail.com> wrote:
+ /* + * Here we end processing of current COPY row. + * Update copy state counter for number of erroneous rows. + */ + cstate->num_errors++; + cstate->escontext->error_occurred = true; + + /* Only print this NOTICE message, if it will not be followed by ERROR */ + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE && + ( + (cstate->opts.on_error == COPY_ON_ERROR_NULL && cstate->opts.reject_limit > 0 && cstate->num_errors <= cstate->opts.reject_limit) || + (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && (cstate->opts.reject_limit == 0 || cstate->num_errors <= cstate->opts.reject_limit)) + )) { this is kind of hard to comprehend. so attached is a simple version of it based on v8.
+1
So, in this version you essentially removed support for REJECT_LIMIT +
SET_TO_NULL feature? Looks like a promising change. It is more likely
to see this committed.
So, +1 on that too.
However, v9 lacks tests for REJECT_LIMIT vs erroneous rows tests.
In short, we need this message somewhere in a regression test.
```
ERROR: skipped more than REJECT_LIMIT (xxx) rows due to data type
incompatibility
```
Also, please update commit msg with all authors and reviewers. This
will make committer job a little bit easier
--
Best regards,
Kirill Reshke
On Wed, Jan 8, 2025 at 3:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
So, in this version you essentially removed support for REJECT_LIMIT +
SET_TO_NULL feature? Looks like a promising change. It is more likely
to see this committed.
So, +1 on that too.However, v9 lacks tests for REJECT_LIMIT vs erroneous rows tests.
In short, we need this message somewhere in a regression test.
```
ERROR: skipped more than REJECT_LIMIT (xxx) rows due to data type
incompatibility
```
hi.
you already answered this question.
since we do not support REJECT_LIMIT+SET_TO_NULL,
so these code path would not be reachable.
Also, please update commit msg with all authors and reviewers. This
will make committer job a little bit easier
commit message polished.
here and there cosmetic changes.
I think there are three remaining issues that may need more attention
1.
Table 27.42. pg_stat_progress_copy View
(<structname>pg_stat_progress_copy</structname>)
column pg_stat_progress_copy.tuples_skipped now the description is
""
When the ON_ERROR option is set to ignore, this value shows the number of tuples
skipped due to malformed data. When the ON_ERROR option is set to set_to_null,
this value shows the number of tuples where malformed data was converted to
NULL.
"""
now the column name tuples_skipped would not be that suitable for
(on_error set_to_null).
since now it is not tuple skipped, it is in a tuple some value was set to null.
Or
we can skip progress reports for (on_error set_to_null) case.
2. The doc is not very great, I guess.
3. do we settled (on_error set_to_null) syntax.
Attachments:
v10-0001-extent-on_error-action-introduce-new-option-on_e.patchtext/x-patch; charset=US-ASCII; name=v10-0001-extent-on_error-action-introduce-new-option-on_e.patchDownload
From 2f77abbb058c952715838d31b1e04f678a079e30 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 10 Jan 2025 14:33:55 +0800
Subject: [PATCH v10 1/1] extent "on_error action", introduce new option:
on_error set_to_null.
due to current grammar, we cannot use "on_error null", so i choose on_error set_to_null.
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applicable when using
the non-binary format for COPY FROM. However, the not-null constraint will still
be enforced. If a conversion error leads to a NULL value in a column that has a
not-null constraint, a not-null constraint violation error will be triggered.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>,
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>,
Jim Jones <jim.jones@uni-muenster.de>,
"David G. Johnston" <david.g.johnston@gmail.com>,
Yugo NAGATA <nagata@sraoss.co.jp>,
torikoshia <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/monitoring.sgml | 8 ++--
doc/src/sgml/ref/copy.sgml | 27 +++++++----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 33 +++++++++----
src/backend/commands/copyfromparse.c | 46 +++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 4 +-
src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 ++++++++++++++++++
10 files changed, 205 insertions(+), 28 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d0d176cc54..6639561384 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5975,10 +5975,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
<structfield>tuples_skipped</structfield> <type>bigint</type>
</para>
<para>
- Number of tuples skipped because they contain malformed data.
- This counter only advances when a value other than
- <literal>stop</literal> is specified to the <literal>ON_ERROR</literal>
- option.
+ When the <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>,
+ this value shows the number of tuples skipped due to malformed data.
+ When the <literal>ON_ERROR</literal> option is set to <literal>set_to_null</literal>,
+ this value shows the number of tuples where malformed data was converted to NULL.
</para></entry>
</row>
</tbody>
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..4346fb0756 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
+ For <literal>ignore</literal> option,
+ a <literal>NOTICE</literal> message containing the ignored row count is
emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null happened is
+ emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
+ <para>
+ When <literal>LOG_VERBOSITY</literal> option is set to
+ <literal>verbose</literal>, for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message specifies the line number of the input file and column name
+ where the input value was replaced with NULL due to input conversion failure.
When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc2..afe60758d4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f560..33bd67767e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1029,6 +1029,10 @@ CopyFrom(CopyFromState cstate)
continue;
}
+ /* Report that this tuple some value was replaced with NULL by the ON_ERROR clause */
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL && cstate->num_errors > 0)
+ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+ cstate->num_errors);
ExecStoreVirtualTuple(myslot);
/*
@@ -1321,14 +1325,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%llu row was skipped due to data type incompatibility",
- "%llu rows were skipped due to data type incompatibility",
- (unsigned long long) cstate->num_errors,
- (unsigned long long) cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("Erroneous values in %llu row was replaced with NULL",
+ "Erroneous values in %llu rows were replaced with NULL",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1474,10 +1486,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563..8d5ab08491 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft
- * errors
+ * errors. If ON_ERROR is specified with set_to_null, try
+ * to replace with NULL.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -960,9 +962,47 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we use this count the number of rows (not fields) that
+ * successfully applied the on_error set_to_null
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * we need another InputFunctionCallSafe so we can error out
+ * not-null violation for domain with not-null constraint.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("column \"%s\" was set to NULL due to data type incompatibility at line %llu",
+ cstate->cur_attname,
+ (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
+
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1001,6 +1041,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
}
else
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 81cbf10aa2..04a155ad5f 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3250,7 +3250,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef72..7ebf4f7893 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e..50759eaf1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -98,7 +98,9 @@ typedef struct CopyFromStateData
ErrorSaveContext *escontext; /* soft error trapped during in_functions
* execution */
uint64 num_errors; /* total number of rows which contained soft
- * errors */
+ * errors, for ON_ERROR set_to_null, it's the
+ * number of rows successfully converted to null
+ */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..377be5b99d 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE: column "b" was set to NULL due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column b: "a"
+NOTICE: column "c" was set to NULL due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column c: "d"
+NOTICE: column "b" was set to NULL due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null, line 2, column b: "b"
+NOTICE: column "c" was set to NULL due to data type incompatibility at line 3
+CONTEXT: COPY t_on_error_null, line 3, column c: "e"
+NOTICE: Erroneous values in 3 rows were replaced with NULL
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..6cd477af14 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Fri, 10 Jan 2025 at 11:38, jian he <jian.universality@gmail.com> wrote:
I think there are three remaining issues that may need more attention
1.
Table 27.42. pg_stat_progress_copy View
(<structname>pg_stat_progress_copy</structname>)
column pg_stat_progress_copy.tuples_skipped now the description is
""
When the ON_ERROR option is set to ignore, this value shows the number of tuples
skipped due to malformed data. When the ON_ERROR option is set to set_to_null,
this value shows the number of tuples where malformed data was converted to
NULL.
"""
now the column name tuples_skipped would not be that suitable for
(on_error set_to_null).
since now it is not tuple skipped, it is in a tuple some value was set to null.
Indeed this is something we need to fix.
Or
we can skip progress reports for (on_error set_to_null) case.
Maybe we can add a `malformed_tuples` column to this view?
3. do we settled (on_error set_to_null) syntax.
I think so. I prefer this syntax to others discussed in this thread.
--
Best regards,
Kirill Reshke
On Sat, Jan 11, 2025 at 5:54 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Fri, 10 Jan 2025 at 11:38, jian he <jian.universality@gmail.com> wrote:
I think there are three remaining issues that may need more attention
1.
Table 27.42. pg_stat_progress_copy View
(<structname>pg_stat_progress_copy</structname>)
column pg_stat_progress_copy.tuples_skipped now the description is
""
When the ON_ERROR option is set to ignore, this value shows the number of tuples
skipped due to malformed data. When the ON_ERROR option is set to set_to_null,
this value shows the number of tuples where malformed data was converted to
NULL.
"""
now the column name tuples_skipped would not be that suitable for
(on_error set_to_null).
since now it is not tuple skipped, it is in a tuple some value was set to null.Indeed this is something we need to fix.
Or
we can skip progress reports for (on_error set_to_null) case.Maybe we can add a `malformed_tuples` column to this view?
we can do this later.
so for on_error set_to_null, i've removed pgstat_progress_update_param
related code.
the attached patch also did some doc enhancement, error message enhancement.
Attachments:
v11-0001-COPY-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v11-0001-COPY-on_error-set_to_null.patchDownload
From a95d42bf1e6044c6c9a2afbb15d168d6679eceab Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 14 Jan 2025 13:46:12 +0800
Subject: [PATCH v11 1/1] COPY (on_error set_to_null)
Extent "on_error action", introduce new option: on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applicable when using the
non-binary format for COPY FROM. However, the not-null constraint will still be
enforced. If a conversion error leads to a NULL value in a column that has a
not-null constraint, a not-null constraint violation error will be triggered.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>,
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>,
Jim Jones <jim.jones@uni-muenster.de>,
"David G. Johnston" <david.g.johnston@gmail.com>,
Yugo NAGATA <nagata@sraoss.co.jp>,
torikoshia <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 34 +++++++++-----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 29 ++++++++----
src/backend/commands/copyfromparse.c | 46 +++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 4 +-
src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 ++++++++++++++++++
9 files changed, 201 insertions(+), 27 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..5e1d08ab91 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with
+ <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ For <literal>ignore</literal> option,
+ a <literal>NOTICE</literal> message containing the ignored row count is
+ emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null
+ happened is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
+ <para>
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message containing the line of the input file and the column name
+ where value was replaced with <literal>NULL</literal> for each input conversion failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_to_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc2..afe60758d4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f560..c38ff3dc6f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1321,14 +1321,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%llu row was skipped due to data type incompatibility",
- "%llu rows were skipped due to data type incompatibility",
- (unsigned long long) cstate->num_errors,
- (unsigned long long) cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("erroneous values in %llu row was replaced with null",
+ "erroneous values in %llu rows were replaced with null",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1474,10 +1482,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563..c0f6ce5057 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft
- * errors
+ * errors. If ON_ERROR is specified with set_to_null, try
+ * to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -960,9 +962,47 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we use this count the number of rows (not fields) that
+ * successfully applied the on_error set_to_null
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * we need another InputFunctionCallSafe so we can error out
+ * not-null violation for domain with not-null constraint.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+ cstate->cur_attname,
+ (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
+
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1001,6 +1041,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
}
else
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 81cbf10aa2..04a155ad5f 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3250,7 +3250,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef72..7ebf4f7893 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e..50759eaf1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -98,7 +98,9 @@ typedef struct CopyFromStateData
ErrorSaveContext *escontext; /* soft error trapped during in_functions
* execution */
uint64 num_errors; /* total number of rows which contained soft
- * errors */
+ * errors, for ON_ERROR set_to_null, it's the
+ * number of rows successfully converted to null
+ */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..9a5acef8db 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE: column "b" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column b: "a"
+NOTICE: column "c" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column c: "d"
+NOTICE: column "b" was set to null due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null, line 2, column b: "b"
+NOTICE: column "c" was set to null due to data type incompatibility at line 3
+CONTEXT: COPY t_on_error_null, line 3, column c: "e"
+NOTICE: erroneous values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..003a91648e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Tue, 14 Jan 2025 at 10:51, jian he <jian.universality@gmail.com> wrote:
the attached patch also did some doc enhancement, error message enhancement.
LGTM
--
Best regards,
Kirill Reshke
hi.
rebase only.
Attachments:
v12-0001-COPY-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v12-0001-COPY-on_error-set_to_null.patchDownload
From ce0ce6438094cad553e509db65b7fd27de2b9af6 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 7 Mar 2025 11:43:51 +0800
Subject: [PATCH v12 1/1] COPY (on_error set_to_null)
Extent "on_error action", introduce new option: on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applicable when using the
non-binary format for COPY FROM. However, the not-null constraint will still be
enforced. If a conversion error leads to a NULL value in a column that has a
not-null constraint, a not-null constraint violation error will be triggered.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 34 +++++++++++-----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 29 +++++++++-----
src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++
8 files changed, 196 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..91bc25b9ab3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with
+ <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ For <literal>ignore</literal> option,
+ a <literal>NOTICE</literal> message containing the ignored row count is
+ emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null
+ happened is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
+ <para>
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message containing the line of the input file and the column name
+ where value was replaced with <literal>NULL</literal> for each input conversion failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_to_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..afe60758d40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..4502ce2d366 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%llu row was skipped due to data type incompatibility",
- "%llu rows were skipped due to data type incompatibility",
- (unsigned long long) cstate->num_errors,
- (unsigned long long) cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("erroneous values in %llu row was replaced with null",
+ "erroneous values in %llu rows were replaced with null",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..e2b4d1f7ec9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with set_to_null, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we use this count the number of rows (not fields) that
+ * successfully applied the on_error set_to_null
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * we need another InputFunctionCallSafe so we can error out
+ * not-null violation for domain with not-null constraint.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+ cstate->cur_attname,
+ (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8432be641ac..fcdc61bcb57 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3279,7 +3279,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..9a5acef8db0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE: column "b" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column b: "a"
+NOTICE: column "c" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column c: "d"
+NOTICE: column "b" was set to null due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null, line 2, column b: "b"
+NOTICE: column "c" was set to null due to data type incompatibility at line 3
+CONTEXT: COPY t_on_error_null, line 3, column c: "e"
+NOTICE: erroneous values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..003a91648e2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
Hi Jian
On 07.03.25 04:48, jian he wrote:
hi.
rebase only.
I revisited this patch today. It applies and builds cleanly, and it
works as expected.
Some tests and minor comments:
====
1) WARNING might be a better fit than NOTICE here.
postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format csv);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1,a
2,1
3,2
4,b
a,c
\.
NOTICE: erroneous values in 3 rows were replaced with null
COPY 5
postgres=# SELECT * FROM t;
x | y | z
------+------+------
1 | NULL | NULL
2 | 1 | NULL
3 | 2 | NULL
4 | NULL | NULL
NULL | NULL | NULL
(5 rows)
postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format
csv, log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1,a
2,1
3,2
4,b
a,c
\.
NOTICE: column "y" was set to null due to data type incompatibility at
line 1
NOTICE: column "y" was set to null due to data type incompatibility at
line 4
NOTICE: column "x" was set to null due to data type incompatibility at
line 5
NOTICE: column "y" was set to null due to data type incompatibility at
line 5
NOTICE: erroneous values in 3 rows were replaced with null
COPY 5
postgres=# SELECT * FROM t;
x | y | z
------+------+------
1 | NULL | NULL
2 | 1 | NULL
3 | 2 | NULL
4 | NULL | NULL
NULL | NULL | NULL
(5 rows)
I would still leave the extra messages from "log_verbosity verbose" as
NOTICE though. What do you think?
====
2) Inconsistent terminology. Invalid values in "on_error set_to_null"
mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
I don't want to get into the semantics of erroneous or invalid, but
sticking to one terminology would IMHO look better.
postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error stop, format csv,
log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1,a
2,1
3,2
4,b
a,c
\.
ERROR: invalid input syntax for type integer: "a"
CONTEXT: COPY t, line 1, column y: "a"
postgres=# SELECT * FROM t;
x | y | z
---+---+---
(0 rows)
====
3) same as in 1)
postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv,
log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1,a
2,1
3,2
4,b
a,c
\.
NOTICE: skipping row due to data type incompatibility at line 1 for
column "y": "a"
NOTICE: skipping row due to data type incompatibility at line 4 for
column "y": "b"
NOTICE: skipping row due to data type incompatibility at line 5 for
column "x": "a"
NOTICE: 3 rows were skipped due to data type incompatibility
COPY 2
postgres=# SELECT * FROM t;
x | y | z
---+---+------
2 | 1 | NULL
3 | 2 | NULL
(2 rows)====
====
"on_error ignore" works well with "reject_limit #"
postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv,
log_verbosity verbose, reject_limit 1);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1,a
2,1
3,2
4,b
a,c
\.
NOTICE: skipping row due to data type incompatibility at line 1 for
column "y": "a"
NOTICE: skipping row due to data type incompatibility at line 4 for
column "y": "b"
ERROR: skipped more than REJECT_LIMIT (1) rows due to data type
incompatibility
CONTEXT: COPY t, line 4, column y: "b"
postgres=# SELECT * FROM t;
x | y | z
---+---+---
(0 rows)
best regards, Jim
On Tue, Mar 11, 2025 at 6:31 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
I revisited this patch today. It applies and builds cleanly, and it
works as expected.Some tests and minor comments:
hi. Jim Jones.
thanks for testsing it again!
====
1) WARNING might be a better fit than NOTICE here.
but NOTICE, on_errror set_to_null is aligned with on_errror ignore.
I would still leave the extra messages from "log_verbosity verbose" as
NOTICE though. What do you think?====
When LOG_VERBOSITY option is set to verbose,
for ignore option, a NOTICE message containing the line of the input
file and the column name
whose input conversion has failed is emitted for each discarded row;
for set_to_null option, a NOTICE message containing the line of the
input file and the column name
where value was replaced with NULL for each input conversion failure.
see the above desciption,
on_errror set_to_null is aligned with on_errror ignore.
it's just on_errror ignore is per row, on_errror set_to_null is per
column/field.
so NOTICE is aligned with other on_error option.
2) Inconsistent terminology. Invalid values in "on_error set_to_null"
mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
I don't want to get into the semantics of erroneous or invalid, but
sticking to one terminology would IMHO look better.
I am open to changing it.
what do you think "invalid values in %llu row was replaced with null"?
====
"on_error ignore" works well with "reject_limit #"
i remember there was some confusion about on_error set_to_null with
reject_limit option.
I choose to not suport it.
obviously, if there is consenses, we can support it later.
On 12.03.25 09:00, jian he wrote:
1) WARNING might be a better fit than NOTICE here.
but NOTICE, on_errror set_to_null is aligned with on_errror ignore.
I would still leave the extra messages from "log_verbosity verbose" as
NOTICE though. What do you think?====
When LOG_VERBOSITY option is set to verbose,
for ignore option, a NOTICE message containing the line of the input
file and the column name
whose input conversion has failed is emitted for each discarded row;
for set_to_null option, a NOTICE message containing the line of the
input file and the column name
where value was replaced with NULL for each input conversion failure.see the above desciption,
on_errror set_to_null is aligned with on_errror ignore.
it's just on_errror ignore is per row, on_errror set_to_null is per
column/field.
so NOTICE is aligned with other on_error option.
I considered using a WARNING due to the severity of the issue - the
failure to import data - but either NOTICE or WARNING works for me.
2) Inconsistent terminology. Invalid values in "on_error set_to_null"
mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
I don't want to get into the semantics of erroneous or invalid, but
sticking to one terminology would IMHO look better.I am open to changing it.
what do you think "invalid values in %llu row was replaced with null"?
LGTM: "invalid values in %llu rows were replaced with null"
Thanks for the patch!
Best, Jim
On Wed, Mar 12, 2025 at 4:26 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
2) Inconsistent terminology. Invalid values in "on_error set_to_null"
mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
I don't want to get into the semantics of erroneous or invalid, but
sticking to one terminology would IMHO look better.I am open to changing it.
what do you think "invalid values in %llu row was replaced with null"?LGTM: "invalid values in %llu rows were replaced with null"
changed based on this.
also minor documentation tweaks.
Attachments:
v13-0001-COPY-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v13-0001-COPY-on_error-set_to_null.patchDownload
From 3553eee56c8dd0c3ce334d1f37b511acbbc640af Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 18 Mar 2025 11:51:48 +0800
Subject: [PATCH v13 1/1] COPY (on_error set_to_null)
Extent "on_error action", introduce new option: on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.
However, the not-null constraint will still be enforced.
If a column have not-null constraint, successful (on_error set_to_null)
action will cause not-null constraint violation.
This also apply to column type is domain with not-null constraint.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 34 +++++++++++-----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 29 +++++++++-----
src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++
8 files changed, 196 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..cdb725d7565 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing invalid input values with
+ <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ For <literal>ignore</literal> option,
+ a <literal>NOTICE</literal> message containing the ignored row count is
+ emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows where invalid input values were replaced with
+ null is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
+ <para>
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message containing the line of the input file and the column name
+ where value was replaced with <literal>NULL</literal> for each input conversion failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_to_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..afe60758d40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..43a227eae72 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%llu row was skipped due to data type incompatibility",
- "%llu rows were skipped due to data type incompatibility",
- (unsigned long long) cstate->num_errors,
- (unsigned long long) cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %llu row was replaced with null",
+ "invalid values in %llu rows were replaced with null",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..e2b4d1f7ec9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with set_to_null, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we use this count the number of rows (not fields) that
+ * successfully applied the on_error set_to_null
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * we need another InputFunctionCallSafe so we can error out
+ * not-null violation for domain with not-null constraint.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+ cstate->cur_attname,
+ (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 9a4d993e2bc..7980513a9bd 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..caa94bfd526 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE: column "b" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column b: "a"
+NOTICE: column "c" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column c: "d"
+NOTICE: column "b" was set to null due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null, line 2, column b: "b"
+NOTICE: column "c" was set to null due to data type incompatibility at line 3
+CONTEXT: COPY t_on_error_null, line 3, column c: "e"
+NOTICE: invalid values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..003a91648e2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Tue, 18 Mar 2025 at 09:26, jian he <jian.universality@gmail.com> wrote:
changed based on this.
also minor documentation tweaks.
Few comments:
1) I felt this is wrong:
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 9a4d993e2bc..7980513a9bd 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE",
"FORCE_QUOTE",
"FORCE_NOT_NULL",
"FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL",
"LOG_VERBOSITY");
as the following fails:
postgres=# copy t_on_error_null from stdin WITH ( set_to_null );
ERROR: option "set_to_null" not recognized
LINE 1: copy t_on_error_null from stdin WITH ( set_to_null );
2) Can you limit this to 80 chars if possible to improve the readability:
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and
continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing
invalid input values with
+ <literal>NULL</literal> and move to the next field.
3) similarly here too:
+ For <literal>ignore</literal> option,
+ a <literal>NOTICE</literal> message containing the ignored row count is
+ emitted at the end of the <command>COPY FROM</command> if at
least one row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of
rows where invalid input values were replaced with
+ null is emitted at the end of the <command>COPY FROM</command>
if at least one row was replaced.
4) Could you mention a brief one line in the commit message as to why
"on_error null" cannot be used:
Extent "on_error action", introduce new option: on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose
"on_error set_to_null".
Regards,
Vignesh
On Fri, Mar 21, 2025 at 2:34 PM vignesh C <vignesh21@gmail.com> wrote:
Few comments: 1) I felt this is wrong: diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 9a4d993e2bc..7980513a9bd 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id, COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT", - "ON_ERROR", "LOG_VERBOSITY"); + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");as the following fails:
postgres=# copy t_on_error_null from stdin WITH ( set_to_null );
ERROR: option "set_to_null" not recognized
LINE 1: copy t_on_error_null from stdin WITH ( set_to_null );
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_to_null");
yech. I think I fixed this.
2) Can you limit this to 80 chars if possible to improve the readability: + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, and + <literal>set_to_null</literal> means replace columns containing invalid input values with + <literal>NULL</literal> and move to the next field.3) similarly here too: + For <literal>ignore</literal> option, + a <literal>NOTICE</literal> message containing the ignored row count is + emitted at the end of the <command>COPY FROM</command> if at least one row was discarded. + For <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message indicating the number of rows where invalid input values were replaced with + null is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
sure.
4) Could you mention a brief one line in the commit message as to why
"on_error null" cannot be used:
Extent "on_error action", introduce new option: on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose
"on_error set_to_null".
by the following changes, we can change to (on_error null).
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3579,6 +3579,7 @@ copy_generic_opt_elem:
copy_generic_opt_arg:
opt_boolean_or_string { $$ =
(Node *) makeString($1); }
+ | NULL_P
{ $$ = (Node *) makeString("null"); }
| NumericOnly
{ $$ = (Node *) $1; }
| '*'
{ $$ = (Node *) makeNode(A_Star); }
| DEFAULT { $$ = (Node
*) makeString("default"); }
COPY x from stdin (format null);
ERROR: syntax error at or near "null"
LINE 1: COPY x from stdin (format null);
^
will become
COPY x from stdin (format null);
ERROR: COPY format "null" not recognized
LINE 1: COPY x from stdin (format null);
^
it will cause NULL_P from reserved word to
non-reserved word in the COPY related command.
I am not sure this is what we want.
Anyway, I attached both two version
(ON_ERROR SET_TO_NULL) (ON_ERROR NULL).
Attachments:
v14-0001-COPY-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v14-0001-COPY-on_error-set_to_null.patchDownload
From 7a6b7edf0877bd9c5bb2629888d3d9c29618ca5d Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 24 Mar 2025 15:14:17 +0800
Subject: [PATCH v14 1/1] COPY (on_error set_to_null)
Extent "on_error action", introduce new option: on_error set_to_null.
Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will becomen reserved to non-reserved
word. so we choose "on_error set_to_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.
However, the not-null constraint will still be enforced.
If a column have not-null constraint, successful (on_error set_to_null)
action will cause not-null constraint violation.
This also apply to column type is domain with not-null constraint.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 36 ++++++++++++-----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 29 +++++++++-----
src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++
8 files changed, 198 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..1909c11edff 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_to_null</literal> means replace columns containing invalid
+ input values with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_to_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_to_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..afe60758d40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..43a227eae72 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%llu row was skipped due to data type incompatibility",
- "%llu rows were skipped due to data type incompatibility",
- (unsigned long long) cstate->num_errors,
- (unsigned long long) cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %llu row was replaced with null",
+ "invalid values in %llu rows were replaced with null",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..e2b4d1f7ec9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with set_to_null, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we use this count the number of rows (not fields) that
+ * successfully applied the on_error set_to_null
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * we need another InputFunctionCallSafe so we can error out
+ * not-null violation for domain with not-null constraint.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+ cstate->cur_attname,
+ (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 98951aef82c..c79b3af0495 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_to_null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..caa94bfd526 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE: column "b" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column b: "a"
+NOTICE: column "c" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column c: "d"
+NOTICE: column "b" was set to null due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null, line 2, column b: "b"
+NOTICE: column "c" was set to null due to data type incompatibility at line 3
+CONTEXT: COPY t_on_error_null, line 3, column c: "e"
+NOTICE: invalid values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..003a91648e2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
v14-0001-COPY-on_error-null.nocfbotapplication/octet-stream; name=v14-0001-COPY-on_error-null.nocfbotDownload
From b41893c05c652542e02ad9d90d54b3d7c3886e8e Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 24 Mar 2025 14:55:09 +0800
Subject: [PATCH v14 1/1] COPY (on_error null)
Extent "on_error action", introduce new option: on_error null.
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.
However, the not-null constraint will still be enforced.
If a column have not-null constraint, successful (on_error null)
action will cause not-null constraint violation.
This also apply to column type is domain with not-null constraint.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 36 ++++++++++++-----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 29 +++++++++-----
src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++-
src/backend/parser/gram.y | 1 +
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++
9 files changed, 199 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..ab7e0c45b51 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -395,22 +395,35 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
<literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>null</literal> means replace columns containing
+ invalid input values with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
- containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ For <literal>ignore</literal> option,
+ a <literal>NOTICE</literal> message containing the ignored row count is
+ emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+ For <literal>null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted at
+ the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
+ <para>
+ When <literal>LOG_VERBOSITY</literal> option is set to
+ <literal>verbose</literal>, for <literal>ignore</literal> option, a
+ <literal>NOTICE</literal> message containing the line of the input file
+ and the column name whose input conversion has failed is emitted for each
+ discarded row;
+ for <literal>null</literal> option, a <literal>NOTICE</literal> message
+ containing the line of the input file and the column name where value
+ was replaced with <literal>NULL</literal> for each input conversion failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..a452ad11888 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..43a227eae72 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%llu row was skipped due to data type incompatibility",
- "%llu rows were skipped due to data type incompatibility",
- (unsigned long long) cstate->num_errors,
- (unsigned long long) cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%llu row was skipped due to data type incompatibility",
+ "%llu rows were skipped due to data type incompatibility",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %llu row was replaced with null",
+ "invalid values in %llu rows were replaced with null",
+ (unsigned long long) cstate->num_errors,
+ (unsigned long long) cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..288c60abf9f 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
/*
* If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with NULL, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we use this count the number of rows (not fields) that
+ * successfully applied the (on_error null)
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * we need another InputFunctionCallSafe so we can error out
+ * not-null violation for domain with not-null constraint.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ ereport(NOTICE,
+ errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+ cstate->cur_attname,
+ (unsigned long long) cstate->cur_lineno));
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
return true;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 271ae26cbaf..e79473c997a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3579,6 +3579,7 @@ copy_generic_opt_elem:
copy_generic_opt_arg:
opt_boolean_or_string { $$ = (Node *) makeString($1); }
+ | NULL_P { $$ = (Node *) makeString("null"); }
| NumericOnly { $$ = (Node *) $1; }
| '*' { $$ = (Node *) makeNode(A_Star); }
| DEFAULT { $$ = (Node *) makeString("default"); }
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 98951aef82c..b0c7ac82ca6 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..18c80c238a1 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error null, log_verbosity verbose);
+NOTICE: column "b" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column b: "a"
+NOTICE: column "c" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null, line 1, column c: "d"
+NOTICE: column "b" was set to null due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null, line 2, column b: "b"
+NOTICE: column "c" was set to null due to data type incompatibility at line 3
+CONTEXT: COPY t_on_error_null, line 3, column c: "e"
+NOTICE: invalid values in 3 rows were replaced with null
+-- check inserted content
+SELECT * FROM t_on_error_null ORDER BY a;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..40776d55c96 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error null);
+COPY x FROM stdin (on_error null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error null, log_verbosity verbose);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+SELECT * FROM t_on_error_null ORDER BY a;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Mon, 24 Mar 2025 at 13:21, jian he <jian.universality@gmail.com> wrote:
I am not sure this is what we want.
Anyway, I attached both two version
Few comments
1) I understood the problem, your first approach is ok for me.
2) Here in error we say column c1 violates not-null constraint and in
the context we show column c2, should the context also display c2
column:
postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
CREATE TABLE
postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
a b
\.
ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1, column c2: "b"
3) typo becomen should be become:
null will becomen reserved to non-reserved
4) There is a whitespace error while applying patch
Applying: COPY (on_error set_to_null)
.git/rebase-apply/patch:39: trailing whitespace.
a <literal>NOTICE</literal> message indicating the number of rows
warning: 1 line adds whitespace errors.
Regards,
Vignesh
On Tue, Mar 25, 2025 at 2:31 PM vignesh C <vignesh21@gmail.com> wrote:
2) Here in error we say column c1 violates not-null constraint and in
the context we show column c2, should the context also display c2
column:
postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
CREATE TABLE
postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a b
\.ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1, column c2: "b"
It took me a while to figure out why.
with the attached, now the error message becomes:
ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1: "a,b"
while at it,
(on_error set_to_null, log_verbosity verbose)
error message CONTEXT will only emit out relation name,
this aligns with (on_error ignore, log_verbosity verbose).
one of the message out example:
+NOTICE: column "b" was set to null due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null
3) typo becomen should be become:
null will becomen reserved to non-reserved
fixed.
4) There is a whitespace error while applying patch
Applying: COPY (on_error set_to_null)
.git/rebase-apply/patch:39: trailing whitespace.
a <literal>NOTICE</literal> message indicating the number of rows
warning: 1 line adds whitespace errors.
fixed.
Attachments:
v15-0001-COPY-on_error-set_to_null.patchtext/x-patch; charset=US-ASCII; name=v15-0001-COPY-on_error-set_to_null.patchDownload
From cfd9afbc583aac39f73f224cb70c9196398c3176 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 4 Apr 2025 19:43:52 +0800
Subject: [PATCH v15 1/1] COPY (on_error set_to_null)
Extent "on_error action", introduce new option: on_error set_to_null.
Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words. so we choose "on_error set_to_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.
However, the not-null constraint will still be enforced.
If a column has a not-null constraint, successful (on_error set_to_null)
action will cause not-null constraint violation.
This also applies to column type is domain with not-null constraint.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 36 +++++++++++-----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 29 ++++++++-----
src/backend/commands/copyfromparse.c | 61 +++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 60 +++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++
8 files changed, 215 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..ebe2eaa36e2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_to_null</literal> means replace columns containing invalid
+ input values with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_to_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_to_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_to_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..13bbe58855c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..a3143ca4f29 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
- "%" PRIu64 " rows were skipped due to data type incompatibility",
- cstate->num_errors,
- cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+ "%" PRIu64 " rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %" PRIu64 " row was replaced with null",
+ "invalid values in %" PRIu64 " rows were replaced with null",
+ cstate->num_errors,
+ cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..63a4400c8a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
}
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+ * If ON_ERROR is specified with set_to_null, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,9 +1037,62 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ /*
+ * we use it to count number of rows (not fields!) that
+ * successfully applied on_error set_to_null.
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * when column type is domain with not-null constraint, we need
+ * another InputFunctionCallSafe to error out not-null
+ * violation.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+ /*
+ * Since we emit line number and column info in the below
+ * notice message, we suppress error context information other
+ * than the relation name.
+ */
+ Assert(!cstate->relname_only);
+ cstate->relname_only = true;
+ ereport(NOTICE,
+ errmsg("column \"%s\" was set to null due to data type incompatibility at line %" PRIu64 "",
+ cstate->cur_attname,
+ cstate->cur_lineno));
+
+ /* reset relname_only */
+ cstate->relname_only = false;
+ }
+
+ cstate->cur_attname = NULL;
+
+ continue;
+ }
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ }
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1076,6 +1131,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
Assert(fieldno == attr_count);
return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 98951aef82c..c79b3af0495 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_to_null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..879a898911a 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_to_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdout (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE: column "b" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null
+NOTICE: column "c" was set to null due to data type incompatibility at line 1
+CONTEXT: COPY t_on_error_null
+NOTICE: column "b" was set to null due to data type incompatibility at line 2
+CONTEXT: COPY t_on_error_null
+NOTICE: column "c" was set to null due to data type incompatibility at line 3
+CONTEXT: COPY t_on_error_null
+NOTICE: invalid values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..fbf80004178 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x from stdin (on_error set_to_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10 a d
+11 b 12
+13 14 e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Fri, Apr 4, 2025 at 4:55 AM jian he <jian.universality@gmail.com> wrote:
On Tue, Mar 25, 2025 at 2:31 PM vignesh C <vignesh21@gmail.com> wrote:
2) Here in error we say column c1 violates not-null constraint and in
the context we show column c2, should the context also display c2
column:
postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
CREATE TABLE
postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a b
\.ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1, column c2: "b"It took me a while to figure out why.
with the attached, now the error message becomes:ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1: "a,b"while at it,
(on_error set_to_null, log_verbosity verbose)
error message CONTEXT will only emit out relation name,
this aligns with (on_error ignore, log_verbosity verbose).one of the message out example: +NOTICE: column "b" was set to null due to data type incompatibility at line 2 +CONTEXT: COPY t_on_error_null3) typo becomen should be become:
null will becomen reserved to non-reservedfixed.
4) There is a whitespace error while applying patch
Applying: COPY (on_error set_to_null)
.git/rebase-apply/patch:39: trailing whitespace.
a <literal>NOTICE</literal> message indicating the number of rows
warning: 1 line adds whitespace errors.fixed.
I've reviewed the v15 patch and here are some comments:
How about renaming the new option value to 'set_null"? The 'to' in the
value name seems redundant to me.
---
+ COPY_ON_ERROR_NULL, /* set error field to null */
I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
COPY_ON_ERROR_SET_NULL if we change the option value name) for
consistency with the value name.
---
+ else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values
in %" PRIu64 " row was replaced with null",
+
"invalid values in %" PRIu64 " rows were replaced with null",
+
cstate->num_errors,
+
cstate->num_errors));
How about adding "due to data type incompatibility" at the end of the message?
---
+ ereport(NOTICE,
+ errmsg("column
\"%s\" was set to null due to data type incompatibility at line %"
PRIu64 "",
+
cstate->cur_attname,
+
cstate->cur_lineno));
Similar to the IGNORE case, we can show the data in question in the message.
---
+ else
+ ereport(ERROR,
+
errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does
not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
If domain data type is the sole case where not to accept NULL, can we
check it beforehand to avoid calling the second
InputFunctionCallSafe() for non-domain data types? Also, if we want to
end up with an error when setting NULL to a domain type with NOT NULL,
I think we don't need to try to handle a soft error by passing
econtext to InputFunctionCallSafe().
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Sat, Apr 5, 2025 at 5:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Apr 4, 2025 at 4:55 AM jian he <jian.universality@gmail.com> wrote:
On Tue, Mar 25, 2025 at 2:31 PM vignesh C <vignesh21@gmail.com> wrote:
2) Here in error we say column c1 violates not-null constraint and in
the context we show column c2, should the context also display c2
column:
postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
CREATE TABLE
postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a b
\.ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1, column c2: "b"It took me a while to figure out why.
with the attached, now the error message becomes:ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1: "a,b"while at it,
(on_error set_to_null, log_verbosity verbose)
error message CONTEXT will only emit out relation name,
this aligns with (on_error ignore, log_verbosity verbose).one of the message out example: +NOTICE: column "b" was set to null due to data type incompatibility at line 2 +CONTEXT: COPY t_on_error_null3) typo becomen should be become:
null will becomen reserved to non-reservedfixed.
4) There is a whitespace error while applying patch
Applying: COPY (on_error set_to_null)
.git/rebase-apply/patch:39: trailing whitespace.
a <literal>NOTICE</literal> message indicating the number of rows
warning: 1 line adds whitespace errors.fixed.
I've reviewed the v15 patch and here are some comments:
How about renaming the new option value to 'set_null"? The 'to' in the
value name seems redundant to me.--- + COPY_ON_ERROR_NULL, /* set error field to null */I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
COPY_ON_ERROR_SET_NULL if we change the option value name) for
consistency with the value name.--- + else if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(NOTICE, + errmsg_plural("invalid values in %" PRIu64 " row was replaced with null", + "invalid values in %" PRIu64 " rows were replaced with null", + cstate->num_errors, + cstate->num_errors));How about adding "due to data type incompatibility" at the end of the message?
--- + ereport(NOTICE, + errmsg("column \"%s\" was set to null due to data type incompatibility at line %" PRIu64 "", + cstate->cur_attname, + cstate->cur_lineno));Similar to the IGNORE case, we can show the data in question in the message.
--- + else + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m]));If domain data type is the sole case where not to accept NULL, can we
check it beforehand to avoid calling the second
InputFunctionCallSafe() for non-domain data types? Also, if we want to
end up with an error when setting NULL to a domain type with NOT NULL,
I think we don't need to try to handle a soft error by passing
econtext to InputFunctionCallSafe().
please check attached, hope i have addressed all the points you've mentioned.
If domain data type is the sole case where not to accept NULL, can we
check it beforehand to avoid calling the second
InputFunctionCallSafe() for non-domain data types?
I doubt it.
we have
InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
Oid typioparam, int32 typmod,
fmNodePtr escontext,
Datum *result)
{
LOCAL_FCINFO(fcinfo, 3);
if (str == NULL && flinfo->fn_strict)
{
*result = (Datum) 0; /* just return null result */
return true;
}
}
Most of the non-domain type input functions are strict.
see query result:
select proname, pt.typname, proisstrict,pt.typtype
from pg_type pt
join pg_proc pp on pp.oid = pt.typinput
where pt.typtype <> 'd'
and pt.typtype <> 'p'
and proisstrict is false;
so the second InputFunctionCallSafe will be faster for non-domain types.
before CopyFromTextLikeOneRow we don't know if this type is
domain_with_constraint or not.
Beforehand, we can conditionally call DomainHasConstraints to find out.
but DomainHasConstraints is expensive, which may carry extra
performance issues for non-domain types.
but the second InputFunctionCallSafe call will not be a big issue for
domain_with_constraint,
because the first time domain_in call already cached related structs.
Attachments:
v16-0001-COPY-on_error-set_null.patchtext/x-patch; charset=US-ASCII; name=v16-0001-COPY-on_error-set_null.patchDownload
From f6cd33623f12d8f105af4e847726867e6ed53a6b Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 5 Apr 2025 16:30:10 +0800
Subject: [PATCH v16 1/1] COPY (on_error set_null)
Extent "on_error action", introduce new option: on_error set_null.
Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words. so we choose "on_error set_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.
However, the not-null constraint will still be enforced.
If a column has a not-null constraint, successful (on_error set_null)
action will cause not-null constraint violation.
This also applies to column type is domain with not-null constraint.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
Masahiko Sawada <sawada.mshk@gmail.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 36 ++++++++++-----
src/backend/commands/copy.c | 6 ++-
src/backend/commands/copyfrom.c | 29 ++++++++-----
src/backend/commands/copyfromparse.c | 65 +++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 60 +++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 46 ++++++++++++++++++++
8 files changed, 219 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d6859276bed..db112867fa0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_null</literal> means replace columns containing invalid
+ input values with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..f963d0e51ff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_null") == 0)
+ return COPY_ON_ERROR_SET_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..d4a91b68ac1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
- "%" PRIu64 " rows were skipped due to data type incompatibility",
- cstate->num_errors,
- cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+ "%" PRIu64 " rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+ "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..79e726701ad 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
}
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+ * If ON_ERROR is specified with set_null, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,9 +1037,65 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ /*
+ * we use it to count number of rows (not fields!) that
+ * successfully applied on_error set_null.
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
+
+ /*
+ * when column type is domain with not-null constraint, we need
+ * another InputFunctionCallSafe to error out domain constraint
+ * violation.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ NULL,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+ char *attval;
+
+ /*
+ * Since we emit line number and column info in the below
+ * notice message, we suppress error context information other
+ * than the relation name.
+ */
+ Assert(!cstate->relname_only);
+ Assert(cstate->cur_attval);
+
+ cstate->relname_only = true;
+ attval = CopyLimitPrintoutLength(cstate->cur_attval);
+ ereport(NOTICE,
+ errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ pfree(attval);
+
+ /* reset relname_only */
+ cstate->relname_only = false;
+ }
+
+ cstate->cur_attname = NULL;
+
+ continue;
+ }
+ }
cstate->num_errors++;
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+ cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
{
/*
* Since we emit line number and column info in the below
@@ -1076,6 +1134,9 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
+
Assert(fieldno == attr_count);
return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index c916b9299a8..8e6f4930919 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..935d21ee77a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_SET_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..91fa2087cef 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdout (on_error set_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT: COPY t_on_error_null
+NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d27f3495cf7 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10 x1 yx
+11 zx 12
+13 14 ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
On Sat, Apr 5, 2025 at 1:31 AM jian he <jian.universality@gmail.com> wrote:
On Sat, Apr 5, 2025 at 5:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Apr 4, 2025 at 4:55 AM jian he <jian.universality@gmail.com> wrote:
On Tue, Mar 25, 2025 at 2:31 PM vignesh C <vignesh21@gmail.com> wrote:
2) Here in error we say column c1 violates not-null constraint and in
the context we show column c2, should the context also display c2
column:
postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
CREATE TABLE
postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a b
\.ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1, column c2: "b"It took me a while to figure out why.
with the attached, now the error message becomes:ERROR: null value in column "c1" of relation "t3" violates not-null constraint
DETAIL: Failing row contains (null, null).
CONTEXT: COPY t3, line 1: "a,b"while at it,
(on_error set_to_null, log_verbosity verbose)
error message CONTEXT will only emit out relation name,
this aligns with (on_error ignore, log_verbosity verbose).one of the message out example: +NOTICE: column "b" was set to null due to data type incompatibility at line 2 +CONTEXT: COPY t_on_error_null3) typo becomen should be become:
null will becomen reserved to non-reservedfixed.
4) There is a whitespace error while applying patch
Applying: COPY (on_error set_to_null)
.git/rebase-apply/patch:39: trailing whitespace.
a <literal>NOTICE</literal> message indicating the number of rows
warning: 1 line adds whitespace errors.fixed.
I've reviewed the v15 patch and here are some comments:
How about renaming the new option value to 'set_null"? The 'to' in the
value name seems redundant to me.--- + COPY_ON_ERROR_NULL, /* set error field to null */I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
COPY_ON_ERROR_SET_NULL if we change the option value name) for
consistency with the value name.--- + else if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(NOTICE, + errmsg_plural("invalid values in %" PRIu64 " row was replaced with null", + "invalid values in %" PRIu64 " rows were replaced with null", + cstate->num_errors, + cstate->num_errors));How about adding "due to data type incompatibility" at the end of the message?
--- + ereport(NOTICE, + errmsg("column \"%s\" was set to null due to data type incompatibility at line %" PRIu64 "", + cstate->cur_attname, + cstate->cur_lineno));Similar to the IGNORE case, we can show the data in question in the message.
--- + else + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m]));If domain data type is the sole case where not to accept NULL, can we
check it beforehand to avoid calling the second
InputFunctionCallSafe() for non-domain data types? Also, if we want to
end up with an error when setting NULL to a domain type with NOT NULL,
I think we don't need to try to handle a soft error by passing
econtext to InputFunctionCallSafe().please check attached, hope i have addressed all the points you've mentioned.
If domain data type is the sole case where not to accept NULL, can we
check it beforehand to avoid calling the second
InputFunctionCallSafe() for non-domain data types?I doubt it.
we have
InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
Oid typioparam, int32 typmod,
fmNodePtr escontext,
Datum *result)
{
LOCAL_FCINFO(fcinfo, 3);
if (str == NULL && flinfo->fn_strict)
{
*result = (Datum) 0; /* just return null result */
return true;
}
}Most of the non-domain type input functions are strict.
see query result:select proname, pt.typname, proisstrict,pt.typtype
from pg_type pt
join pg_proc pp on pp.oid = pt.typinput
where pt.typtype <> 'd'
and pt.typtype <> 'p'
and proisstrict is false;so the second InputFunctionCallSafe will be faster for non-domain types.
Agreed.
BTW have you measured the overheads of calling InputFunctionCallSafe
twice? If it's significant, we might want to find other ways to
achieve it as it would not be good to incur overhead just for
relatively rare cases.
Here are some comments:
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ NULL,
+ &values[m]))
Given that we pass NULL to escontext, does this function return false
in an error case? Or can we use InputFunctionCall instead?
I think we should mention that SET_NULL still could fail if the data
type of the column doesn't accept NULL.
How about restructuring the codes around handling data incompatibility
errors like:
else if (!InputFunctionCallSafe(...))
{
if (cstate->opts.on_error == IGNORE)
{
cstate->num_errors++;
if (cstate->opts.log_verbosity == VERBOSE)
write a NOTICE message;
return true; // ignore whole row.
}
else if (cstate->opts.on_error == SET_NULL)
{
current_row_erroneous = true;
set NULL to the column;
if (cstate->opts.log_verbosity == VERBOSE)
write a NOTICE message;
continue; // go to the next column.
}
That way, we have similar structures for both on_error handling and
don't need to reset cstate->cur_attname at the end of SET_NULL
handling.
---
From the regression tests:
--fail, column a is domain with not-null constraint
COPY t_on_error_null FROM STDIN WITH (on_error set_null);
a 11 14
\.
ERROR: domain d_int_not_null does not allow null values
CONTEXT: COPY t_on_error_null, line 1, column a: "a"
I guess that the log messages could confuse users since while the
actual error was caused by setting NULL to the non-NULL domain type
column, the context message says the data 'a' was erroneous.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue, Apr 8, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
BTW have you measured the overheads of calling InputFunctionCallSafe
twice? If it's significant, we might want to find other ways to
achieve it as it would not be good to incur overhead just for
relatively rare cases.
Please check the attached two patches
v17-0001-COPY-on_error-set_null.original,
v17-0001-COPY-on_error-set_null.patch
for non-domain types, (on_error set_null), the performance of these
two are the same.
for domain type with or without constraint,
(on_error set_null): v17.original is slower than v17.patch.
test script:
create unlogged table t2(a text);
insert into t2 select 'a' from generate_Series(1, 10_000_000) g;
copy t2 to '/tmp/2.txt';
CREATE DOMAIN d1 AS INT ;
CREATE DOMAIN d2 AS INT check (value > 0);
create unlogged table t3(a int);
create unlogged table t4(a d1);
create unlogged table t5(a d2);
performance result:
v17-0001-COPY-on_error-set_null.patch
-- 764.903 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 779.253 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- Time: 750.390 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
v17-0001-COPY-on_error-set_null.original
-- 774.943 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 867.671 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 927.685 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
Here are some comments:
+ if (InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + NULL, + &values[m]))Given that we pass NULL to escontext, does this function return false
in an error case? Or can we use InputFunctionCall instead?I think we should mention that SET_NULL still could fail if the data
type of the column doesn't accept NULL.How about restructuring the codes around handling data incompatibility
errors like:else if (!InputFunctionCallSafe(...))
{
if (cstate->opts.on_error == IGNORE)
{
cstate->num_errors++;
if (cstate->opts.log_verbosity == VERBOSE)
write a NOTICE message;
return true; // ignore whole row.
}
else if (cstate->opts.on_error == SET_NULL)
{
current_row_erroneous = true;
set NULL to the column;
if (cstate->opts.log_verbosity == VERBOSE)
write a NOTICE message;
continue; // go to the next column.
}That way, we have similar structures for both on_error handling and
don't need to reset cstate->cur_attname at the end of SET_NULL
handling.
I think we still need to reset cstate->cur_attname.
the current code structure is
``
foreach(cur, cstate->attnumlist)
{
if (condition x)
continue;
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
``
In some cases (last column , condition x is satisfied), once we reach
the ``continue``, then we cannot reach.
``
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
``
---
From the regression tests:--fail, column a is domain with not-null constraint
COPY t_on_error_null FROM STDIN WITH (on_error set_null);
a 11 14
\.
ERROR: domain d_int_not_null does not allow null values
CONTEXT: COPY t_on_error_null, line 1, column a: "a"I guess that the log messages could confuse users since while the
actual error was caused by setting NULL to the non-NULL domain type
column, the context message says the data 'a' was erroneous.
if the second function is InputFunctionCall, then we cannot customize
the error message.
we can't have both.
I guess we need a second InputFunctionCallSafe with escontext NOT NULL.
now i change it to
if (!cstate->domain_with_constraint[m] ||
InputFunctionCallSafe(&in_functions[m],
NULL,
typioparams[m],
att->atttypmod,
(Node *) cstate->escontext,
&values[m]))
else if (string == NULL)
ereport(ERROR,
errcode(ERRCODE_NOT_NULL_VIOLATION),
errmsg("domain %s does not allow null
values", format_type_be(typioparams[m])),
errdatatype(typioparams[m]));
else
ereport(ERROR,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input value for domain %s: \"%s\"",
format_type_be(typioparams[m]), string));
do these ``ELSE IF``, ``ELSE`` error report messages make sense to you?
Attachments:
v17-0001-COPY-on_error-set_null.originalapplication/octet-stream; name=v17-0001-COPY-on_error-set_null.originalDownload
From 7ca7cba0a89a17444ed2640d577343e4d40f2e5d Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 8 Apr 2025 15:18:06 +0800
Subject: [PATCH v17 1/1] COPY (on_error set_null)
Extent "on_error action", introduce new option: on_error set_null.
Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words. so we choose "on_error set_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.
However, the not-null constraint will still be enforced.
If a column has a not-null constraint, successful (on_error set_null)
action will cause not-null constraint violation.
This also applies to column type is domain with not-null constraint.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
Masahiko Sawada <sawada.mshk@gmail.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 36 +++++---
src/backend/commands/copy.c | 6 +-
src/backend/commands/copyfrom.c | 29 +++---
src/backend/commands/copyfromparse.c | 128 +++++++++++++++++++++------
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 60 +++++++++++++
src/test/regress/sql/copy2.sql | 46 ++++++++++
8 files changed, 255 insertions(+), 53 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d6859276bed..db112867fa0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_null</literal> means replace columns containing invalid
+ input values with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..f963d0e51ff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_null") == 0)
+ return COPY_ON_ERROR_SET_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..d4a91b68ac1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
- "%" PRIu64 " rows were skipped due to data type incompatibility",
- cstate->num_errors,
- cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+ "%" PRIu64 " rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+ "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..b3f37c95aec 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
}
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+ * If ON_ERROR is specified with set_null, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,47 +1037,115 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
- cstate->num_errors++;
-
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
{
/*
- * Since we emit line number and column info in the below
- * notice message, we suppress error context information other
- * than the relation name.
- */
- Assert(!cstate->relname_only);
- cstate->relname_only = true;
+ * we use it to count number of rows (not fields!) that
+ * successfully applied on_error set_null.
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
- if (cstate->cur_attval)
+ /*
+ * when column type is domain with constraints, we may
+ * need another InputFunctionCallSafe to error out domain
+ * constraint violation.
+ */
+ cstate->escontext->error_occurred = false;
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
{
- char *attval;
-
- attval = CopyLimitPrintoutLength(cstate->cur_attval);
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
- cstate->cur_lineno,
- cstate->cur_attname,
- attval));
- pfree(attval);
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+ char *attval;
+
+ /*
+ * Since we emit line number and column info in the below
+ * notice message, we suppress error context information other
+ * than the relation name.
+ */
+ Assert(!cstate->relname_only);
+ Assert(cstate->cur_attval);
+
+ cstate->relname_only = true;
+ attval = CopyLimitPrintoutLength(cstate->cur_attval);
+ ereport(NOTICE,
+ errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ pfree(attval);
+
+ /* reset relname_only */
+ cstate->relname_only = false;
+ }
+
+ cstate->cur_attname = NULL;
+ continue;
}
+ else if (string == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
else
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
- cstate->cur_lineno,
- cstate->cur_attname));
-
- /* reset relname_only */
- cstate->relname_only = false;
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input value for domain %s: \"%s\"",
+ format_type_be(typioparams[m]), string));
}
+ else if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ {
+ cstate->num_errors++;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+ /*
+ * Since we emit line number and column info in the below
+ * notice message, we suppress error context information other
+ * than the relation name.
+ */
+ Assert(!cstate->relname_only);
+ cstate->relname_only = true;
+
+ if (cstate->cur_attval)
+ {
+ char *attval;
- return true;
+ attval = CopyLimitPrintoutLength(cstate->cur_attval);
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ pfree(attval);
+ }
+ else
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+ cstate->cur_lineno,
+ cstate->cur_attname));
+
+ /* reset relname_only */
+ cstate->relname_only = false;
+ }
+ return true;
+ }
}
-
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
+
Assert(fieldno == attr_count);
return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index c916b9299a8..8e6f4930919 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..935d21ee77a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_SET_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..353b283ce9f 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdout (on_error set_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "a"
+CONTEXT: COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "-1"
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT: COPY t_on_error_null
+NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d27f3495cf7 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+a 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10 x1 yx
+11 zx 12
+13 14 ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
v17-0001-COPY-on_error-set_null.patchtext/x-patch; charset=US-ASCII; name=v17-0001-COPY-on_error-set_null.patchDownload
From 660389d38a84275a62e497b676c388c063374909 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 8 Apr 2025 15:07:55 +0800
Subject: [PATCH v17 1/1] COPY (on_error set_null)
Extent "on_error action", introduce new option: on_error set_null.
Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words. so we choose "on_error set_null".
Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.
However, the not-null constraint will still be enforced.
If a column has a not-null constraint, successful (on_error set_null)
action will cause not-null constraint violation.
This also applies to column type is domain with not-null constraint.
A regression test for a domain with a not-null constraint has been added.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
Masahiko Sawada <sawada.mshk@gmail.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 36 +++++--
src/backend/commands/copy.c | 6 +-
src/backend/commands/copyfrom.c | 42 ++++++--
src/backend/commands/copyfromparse.c | 130 ++++++++++++++++++-----
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 6 ++
src/test/regress/expected/copy2.out | 60 +++++++++++
src/test/regress/sql/copy2.sql | 46 ++++++++
9 files changed, 277 insertions(+), 52 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d6859276bed..db112867fa0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_null</literal> means replace columns containing invalid
+ input values with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
- When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
+ When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..f963d0e51ff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_null") == 0)
+ return COPY_ON_ERROR_SET_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..750d597d4d0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
- "%" PRIu64 " rows were skipped due to data type incompatibility",
- cstate->num_errors,
- cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+ "%" PRIu64 " rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+ "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ int attr_count = list_length(cstate->attnumlist);
+
+ cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+ foreach_int(attno, cstate->attnumlist)
+ {
+ int i = foreach_current_index(attno);
+ Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+ cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+ }
+ }
+
/* Set up soft error handler for ON_ERROR */
if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..e638d32e8f5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
}
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+ * If ON_ERROR is specified with set_null, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,47 +1037,119 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
- cstate->num_errors++;
-
- if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
{
/*
- * Since we emit line number and column info in the below
- * notice message, we suppress error context information other
- * than the relation name.
- */
- Assert(!cstate->relname_only);
- cstate->relname_only = true;
+ * we use it to count number of rows (not fields!) that
+ * successfully applied on_error set_null.
+ */
+ if (!current_row_erroneous)
+ current_row_erroneous = true;
- if (cstate->cur_attval)
+ cstate->escontext->error_occurred = false;
+ Assert(cstate->domain_with_constraint != NULL);
+
+ /*
+ * when column type is domain with constraints, we may
+ * need another InputFunctionCallSafe to error out domain
+ * constraint violation.
+ */
+ if (!cstate->domain_with_constraint[m] ||
+ InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
{
- char *attval;
-
- attval = CopyLimitPrintoutLength(cstate->cur_attval);
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
- cstate->cur_lineno,
- cstate->cur_attname,
- attval));
- pfree(attval);
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+ char *attval;
+
+ /*
+ * Since we emit line number and column info in the below
+ * notice message, we suppress error context information other
+ * than the relation name.
+ */
+ Assert(!cstate->relname_only);
+ Assert(cstate->cur_attval);
+
+ cstate->relname_only = true;
+ attval = CopyLimitPrintoutLength(cstate->cur_attval);
+ ereport(NOTICE,
+ errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ pfree(attval);
+
+ /* reset relname_only */
+ cstate->relname_only = false;
+ }
+
+ cstate->cur_attname = NULL;
+ continue;
}
+ else if (string == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
else
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
- cstate->cur_lineno,
- cstate->cur_attname));
-
- /* reset relname_only */
- cstate->relname_only = false;
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input value for domain %s: \"%s\"",
+ format_type_be(typioparams[m]), string));
}
+ else if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ {
+ cstate->num_errors++;
+
+ if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+ {
+ /*
+ * Since we emit line number and column info in the below
+ * notice message, we suppress error context information other
+ * than the relation name.
+ */
+ Assert(!cstate->relname_only);
+ cstate->relname_only = true;
+
+ if (cstate->cur_attval)
+ {
+ char *attval;
- return true;
+ attval = CopyLimitPrintoutLength(cstate->cur_attval);
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ pfree(attval);
+ }
+ else
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+ cstate->cur_lineno,
+ cstate->cur_attname));
+
+ /* reset relname_only */
+ cstate->relname_only = false;
+ }
+ return true;
+ }
}
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ if (current_row_erroneous)
+ cstate->num_errors++;
+
Assert(fieldno == attr_count);
return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index c916b9299a8..8e6f4930919 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..935d21ee77a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_SET_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..b427e71b9b3 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,12 @@ typedef struct CopyFromStateData
* att */
bool *defaults; /* if DEFAULT marker was found for
* corresponding att */
+ /*
+ * Set to true if the corresponding att data type is domain with constraint.
+ * normally this field is NULL, except when on_error is specified as SET_NULL.
+ */
+ bool *domain_with_constraint;
+
bool volatile_defexprs; /* is any of defexprs volatile? */
List *range_table; /* single element list of RangeTblEntry */
List *rteperminfos; /* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..3f843d1cd5c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdout (on_error set_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "ss"
+CONTEXT: COPY t_on_error_null, line 1, column a: "ss"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "-1"
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT: COPY t_on_error_null
+NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d77a06668e8 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ss 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10 x1 yx
+11 zx 12
+13 14 ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
Hi,
Thanks for updating the patch and I've read
v17-0001-COPY-on_error-set_null.patch and here are some comments.
+COPY x from stdin (on_error set_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
I understand that REJECT_LIMIT is out of scope for this patch, but
personally, I feel that supporting REJECT_LIMIT with ON_ERROR SET_NULL
would be a natural extension.
- Both IGNORE and SET_NULL share the common behavior of allowing COPY to
continue despite soft errors.
- Since REJECT_LIMIT defines the threshold for how many soft errors can
be tolerated before COPY fails, it seems consistent to allow it with
SET_NULL as well.
+ if (current_row_erroneous)
+ cstate->num_errors++;
Is there any reason this error counting isn't placed inside the "if
(cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)" block?
As far as I can tell, current_row_erroneous is only modified within that
block, so it might make sense to keep this logic together for clarity.
These may be very minor, but I noticed a few inconsistencies in casing
and wording:
+ * If ON_ERROR is specified with IGNORE, skip rows with
soft errors.
+ * If ON_ERROR is specified with set_null, try to
replace with null.
IGNORE is in uppercase, but set_null is lowercase.
+ * we use it to count number of rows
(not fields!) that
+ * successfully applied on_error
set_null.
The sentence should begin with a capital: "We use it..."
Also, I felt it's unclear what "we use it" means. Does it necessary?
+COPY x to stdout (on_error set_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
COPY is uppercase, but to is lowercase.
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
...
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
It might be better to consider standardizing casing across all COPY
statements (e.g., COPY ... TO, COPY ... FROM STDIN) for consistency.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On Tue, Jul 1, 2025 at 10:54 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
Hi,
Thanks for updating the patch and I've read
v17-0001-COPY-on_error-set_null.patch and here are some comments.+ if (current_row_erroneous)
+ cstate->num_errors++;Is there any reason this error counting isn't placed inside the "if
(cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)" block?
As far as I can tell, current_row_erroneous is only modified within that
block, so it might make sense to keep this logic together for clarity.These may be very minor, but I noticed a few inconsistencies in casing
and wording:+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors. + * If ON_ERROR is specified with set_null, try to replace with null.IGNORE is in uppercase, but set_null is lowercase.
+ * we use it to count number of rows (not fields!) that + * successfully applied on_error set_null.The sentence should begin with a capital: "We use it..."
Also, I felt it's unclear what "we use it" means. Does it necessary?
hi.
I changed this comment, also heavily refactored CopyFromTextLikeOneRow based on
v17-0001-COPY-on_error-set_null.patch.
Now it looks way more intuitive, IMHO.
CopyFromTextLikeOneRow
else if (!InputFunctionCallSafe(&in_functions[m],
string,
typioparams[m],
att->atttypmod,
(Node *) cstate->escontext,
&values[m]))
{
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
////code for on_errr ignore
else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
////code for on_errr set_null
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
//code for verbose message for on_error ignore or on_error set_null
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
////code for on_errr ignore loop control
else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
////code for on_errr set_null loop control
}
+COPY x to stdout (on_error set_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdout (on_error set_null);COPY is uppercase, but to is lowercase.
+COPY x from stdin (format BINARY, on_error set_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (on_error set_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE ... +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null inputIt might be better to consider standardizing casing across all COPY
statements (e.g., COPY ... TO, COPY ... FROM STDIN) for consistency.
I followed near code conventions, changing the casing here seems not necessary.
Attachments:
v18-0001-COPY-on_error-set_null.patchtext/x-patch; charset=US-ASCII; name=v18-0001-COPY-on_error-set_null.patchDownload
From feded9f7562f608ec97cbb08399661c6494df021 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 2 Jul 2025 16:32:57 +0800
Subject: [PATCH v18 1/1] COPY (on_error set_null)
Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words. so we choose "on_error set_null".
When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion
errors will result in the affected column being set to NULL. However, column's
not-null constraints are still enforced, attempting to set a NULL value in such
columns will raise a constraint violation error. This applies to column data
type is a domain with a NOT NULL constraint.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
Masahiko Sawada <sawada.mshk@gmail.com>
Atsushi Torikoshi <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/4810
---
doc/src/sgml/ref/copy.sgml | 35 +++++++---
src/backend/commands/copy.c | 6 +-
src/backend/commands/copyfrom.c | 42 +++++++++---
src/backend/commands/copyfromparse.c | 84 ++++++++++++++++++++----
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 7 ++
src/test/regress/expected/copy2.out | 60 +++++++++++++++++
src/test/regress/sql/copy2.sql | 46 +++++++++++++
9 files changed, 247 insertions(+), 36 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..26fb4be1709 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_null</literal> means replace column containing invalid
+ input value with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -458,7 +472,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..f963d0e51ff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_null") == 0)
+ return COPY_ON_ERROR_SET_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..750d597d4d0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
- "%" PRIu64 " rows were skipped due to data type incompatibility",
- cstate->num_errors,
- cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+ "%" PRIu64 " rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+ "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ int attr_count = list_length(cstate->attnumlist);
+
+ cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+ foreach_int(attno, cstate->attnumlist)
+ {
+ int i = foreach_current_index(attno);
+ Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+ cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+ }
+ }
+
/* Set up soft error handler for ON_ERROR */
if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f52f2477df1..2147d8423fd 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
}
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+ * If ON_ERROR is specified with SET_NULL, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1035,7 +1037,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
- cstate->num_errors++;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ cstate->num_errors++;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ cstate->escontext->error_occurred = false;
+ Assert(cstate->domain_with_constraint != NULL);
+
+ /*
+ * When the column's type is a domain with constraints, an
+ * additional InputFunctionCallSafe may be needed to raise
+ * errors for domain constraint violations.
+ */
+ if (!cstate->domain_with_constraint[m] ||
+ InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ }
+ else if (string == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input value for domain %s: \"%s\"",
+ format_type_be(typioparams[m]), string));
+
+ /*
+ * We count only the number of rows (not individual fields)
+ * where ON_ERROR SET_NULL was successfully applied.
+ */
+ if (!current_row_erroneous)
+ {
+ current_row_erroneous = true;
+ cstate->num_errors++;
+ }
+ }
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
{
@@ -1052,24 +1097,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
char *attval;
attval = CopyLimitPrintoutLength(cstate->cur_attval);
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
- cstate->cur_lineno,
- cstate->cur_attname,
- attval));
+
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
pfree(attval);
}
else
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
- cstate->cur_lineno,
- cstate->cur_attname));
-
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+ cstate->cur_lineno,
+ cstate->cur_attname));
+ }
/* reset relname_only */
cstate->relname_only = false;
}
- return true;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ return true;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ continue;
}
cstate->cur_attname = NULL;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8c2ea0b9587..a587f4162ee 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3305,7 +3305,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..935d21ee77a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_SET_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..c82bfab4636 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,13 @@ typedef struct CopyFromStateData
* att */
bool *defaults; /* if DEFAULT marker was found for
* corresponding att */
+ /*
+ * Set to true if the corresponding attribute's data type is a domain with
+ * constraints. This field is usually NULL, except when ON_ERROR is set to
+ * SET_NULL.
+ */
+ bool *domain_with_constraint;
+
bool volatile_defexprs; /* is any of defexprs volatile? */
List *range_table; /* single element list of RangeTblEntry */
List *rteperminfos; /* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..3f843d1cd5c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdout (on_error set_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "ss"
+CONTEXT: COPY t_on_error_null, line 1, column a: "ss"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "-1"
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT: COPY t_on_error_null
+NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d77a06668e8 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ss 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10 x1 yx
+11 zx 12
+13 14 ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
hi.
rebase.
Attachments:
v19-0001-COPY-on_error-set_null.patchtext/x-patch; charset=US-ASCII; name=v19-0001-COPY-on_error-set_null.patchDownload
From b3b2d794c83b36cf129d917d527ebf2cac46ca3b Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 30 Jul 2025 11:06:17 +0800
Subject: [PATCH v19 1/1] COPY (on_error set_null)
Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words. so we choose "on_error set_null".
When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion
errors will result in the affected column being set to NULL. However, column's
not-null constraints are still enforced, attempting to set a NULL value in such
columns will raise a constraint violation error. This applies to column data
type is a domain with a NOT NULL constraint.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
Masahiko Sawada <sawada.mshk@gmail.com>
Atsushi Torikoshi <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/4810
---
doc/src/sgml/ref/copy.sgml | 35 +++++++---
src/backend/commands/copy.c | 6 +-
src/backend/commands/copyfrom.c | 42 +++++++++---
src/backend/commands/copyfromparse.c | 84 ++++++++++++++++++++----
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 7 ++
src/test/regress/expected/copy2.out | 60 +++++++++++++++++
src/test/regress/sql/copy2.sql | 46 +++++++++++++
9 files changed, 247 insertions(+), 36 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..a36e33f320f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -412,23 +412,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_null</literal> means replace column containing invalid
+ input value with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -476,7 +490,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..9213bfb167f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -413,12 +413,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_null") == 0)
+ return COPY_ON_ERROR_SET_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -928,7 +930,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..750d597d4d0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
- "%" PRIu64 " rows were skipped due to data type incompatibility",
- cstate->num_errors,
- cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+ "%" PRIu64 " rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+ "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ int attr_count = list_length(cstate->attnumlist);
+
+ cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+ foreach_int(attno, cstate->attnumlist)
+ {
+ int i = foreach_current_index(attno);
+ Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+ cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+ }
+ }
+
/* Set up soft error handler for ON_ERROR */
if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..ee887a37afd 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -956,6 +956,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1033,7 +1034,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
}
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+ * If ON_ERROR is specified with SET_NULL, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1044,7 +1046,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
- cstate->num_errors++;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ cstate->num_errors++;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ cstate->escontext->error_occurred = false;
+ Assert(cstate->domain_with_constraint != NULL);
+
+ /*
+ * If the column type is a domain with constraints, an
+ * additional InputFunctionCallSafe may be needed to raise
+ * errors for domain constraint violations.
+ */
+ if (!cstate->domain_with_constraint[m] ||
+ InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ }
+ else if (string == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input value for domain %s: \"%s\"",
+ format_type_be(typioparams[m]), string));
+
+ /*
+ * We count only the number of rows (not individual fields)
+ * where ON_ERROR SET_NULL was successfully applied.
+ */
+ if (!current_row_erroneous)
+ {
+ current_row_erroneous = true;
+ cstate->num_errors++;
+ }
+ }
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
{
@@ -1061,24 +1106,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
char *attval;
attval = CopyLimitPrintoutLength(cstate->cur_attval);
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
- cstate->cur_lineno,
- cstate->cur_attname,
- attval));
+
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
pfree(attval);
}
else
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
- cstate->cur_lineno,
- cstate->cur_attname));
-
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+ cstate->cur_lineno,
+ cstate->cur_attname));
+ }
/* reset relname_only */
cstate->relname_only = false;
}
- return true;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ return true;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ continue;
}
cstate->cur_attname = NULL;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index dbc586c5bc3..c88593e4158 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3343,7 +3343,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_null");
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..da3622028e7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -35,6 +35,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_SET_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..c82bfab4636 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,13 @@ typedef struct CopyFromStateData
* att */
bool *defaults; /* if DEFAULT marker was found for
* corresponding att */
+ /*
+ * Set to true if the corresponding attribute's data type is a domain with
+ * constraints. This field is usually NULL, except when ON_ERROR is set to
+ * SET_NULL.
+ */
+ bool *domain_with_constraint;
+
bool volatile_defexprs; /* is any of defexprs volatile? */
List *range_table; /* single element list of RangeTblEntry */
List *rteperminfos; /* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index caa3c44f0d0..919a2296c27 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdout (on_error set_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -775,6 +787,51 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "ss"
+CONTEXT: COPY t_on_error_null, line 1, column a: "ss"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR: invalid input value for domain d_int_not_null: "-1"
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT: COPY t_on_error_null
+NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -834,6 +891,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index cef45868db5..be05ed52def 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -537,6 +541,45 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N 11 13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ss 11 14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1 11 13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10 x1 yx
+11 zx 12
+13 14 ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -606,6 +649,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1
Attachments:
v20-0001-COPY-on_error-set_null.patchtext/x-patch; charset=US-ASCII; name=v20-0001-COPY-on_error-set_null.patchDownload
From 05574920774c9bb2f93f9cb6ecea136aa673cba7 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 10 Nov 2025 18:17:09 +0800
Subject: [PATCH v20 1/1] COPY (on_error set_null)
When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion
errors will result in the affected column being set to NULL. However, column's
not-null constraints are still enforced, attempting to set a NULL value in such
columns will raise a constraint violation error. This applies to column data
type is a domain with a NOT NULL constraint.
Author: Jian He <jian.universality@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by:
Fujii Masao <masao.fujii@oss.nttdata.com>
Jim Jones <jim.jones@uni-muenster.de>
"David G. Johnston" <david.g.johnston@gmail.com>
Yugo NAGATA <nagata@sraoss.co.jp>
torikoshia <torikoshia@oss.nttdata.com>
Masahiko Sawada <sawada.mshk@gmail.com>
Atsushi Torikoshi <torikoshia@oss.nttdata.com>
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/4810
---
doc/src/sgml/ref/copy.sgml | 35 +++++++---
src/backend/commands/copy.c | 6 +-
src/backend/commands/copyfrom.c | 42 +++++++++---
src/backend/commands/copyfromparse.c | 84 ++++++++++++++++++++----
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 7 ++
src/test/regress/expected/copy2.out | 55 ++++++++++++++++
src/test/regress/sql/copy2.sql | 43 ++++++++++++
9 files changed, 239 insertions(+), 36 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fdc24b36bb8..1f42fd0972d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -412,23 +412,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one,
+ and <literal>set_null</literal> means replace column containing invalid
+ input value with <literal>NULL</literal> and move to the next field.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_null</literal>
+ options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
+ <para>
+ For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+ containing the ignored row count is emitted at the end of the <command>COPY
+ FROM</command> if at least one row was discarded.
+ For <literal>set_null</literal> option,
+ a <literal>NOTICE</literal> message indicating the number of rows
+ where invalid input values were replaced with null is emitted
+ at the end of the <command>COPY FROM</command> if at least one row was replaced.
+ </para>
<para>
- A <literal>NOTICE</literal> message containing the ignored row count is
- emitted at the end of the <command>COPY FROM</command> if at least one
- row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
- <literal>verbose</literal>, a <literal>NOTICE</literal> message
+ When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+ for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
containing the line of the input file and the column name whose input
- conversion has failed is emitted for each discarded row.
+ conversion has failed is emitted for each discarded row;
+ for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+ message containing the line of the input file and the column name where
+ value was replaced with <literal>NULL</literal> for each input conversion
+ failure.
When it is set to <literal>silent</literal>, no message is emitted
- regarding ignored rows.
+ regarding input conversion failed rows.
</para>
</listitem>
</varlistentry>
@@ -476,7 +490,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</para>
<para>
This is currently used in <command>COPY FROM</command> command when
- <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+ <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+ or <literal>set_null</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28e878c3688..9d0413f4cfa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -456,12 +456,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_null") == 0)
+ return COPY_ON_ERROR_SET_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -971,7 +973,7 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..87588b688f2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
- cstate->num_errors > 0 &&
+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
- ereport(NOTICE,
- errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
- "%" PRIu64 " rows were skipped due to data type incompatibility",
- cstate->num_errors,
- cstate->num_errors));
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+ "%" PRIu64 " rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+ "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+ }
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ int attr_count = list_length(cstate->attnumlist);
+
+ cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+ foreach_int(attno, cstate->attnumlist)
+ {
+ int i = foreach_current_index(attno);
+ Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+ cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+ }
+ }
+
/* Set up soft error handler for ON_ERROR */
if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..7c0d13ab38b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -956,6 +956,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ bool current_row_erroneous = false;
tupDesc = RelationGetDescr(cstate->rel);
attr_count = list_length(cstate->attnumlist);
@@ -1033,7 +1034,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
}
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+ * If ON_ERROR is specified with SET_NULL, try to replace with null.
*/
else if (!InputFunctionCallSafe(&in_functions[m],
string,
@@ -1044,7 +1046,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
- cstate->num_errors++;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ cstate->num_errors++;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ {
+ cstate->escontext->error_occurred = false;
+ Assert(cstate->domain_with_constraint != NULL);
+
+ /*
+ * If the column type is a domain with constraints, an
+ * additional InputFunctionCallSafe may be needed to raise
+ * errors for domain constraint violations.
+ */
+ if (!cstate->domain_with_constraint[m] ||
+ InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ nulls[m] = true;
+ values[m] = (Datum) 0;
+ }
+ else if (string == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+ errdatatype(typioparams[m]));
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input value for domain %s: \"%s\"",
+ format_type_be(typioparams[m]), string));
+
+ /*
+ * We count only the number of rows (not individual fields)
+ * where ON_ERROR SET_NULL was successfully applied.
+ */
+ if (!current_row_erroneous)
+ {
+ current_row_erroneous = true;
+ cstate->num_errors++;
+ }
+ }
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
{
@@ -1061,24 +1106,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
char *attval;
attval = CopyLimitPrintoutLength(cstate->cur_attval);
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
- cstate->cur_lineno,
- cstate->cur_attname,
- attval));
+
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ ereport(NOTICE,
+ errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+ cstate->cur_lineno,
+ cstate->cur_attname,
+ attval));
pfree(attval);
}
else
- ereport(NOTICE,
- errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
- cstate->cur_lineno,
- cstate->cur_attname));
-
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ ereport(NOTICE,
+ errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+ cstate->cur_lineno,
+ cstate->cur_attname));
+ }
/* reset relname_only */
cstate->relname_only = false;
}
- return true;
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ return true;
+ else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+ continue;
}
cstate->cur_attname = NULL;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 316a2dafbf1..7124f468ba4 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3381,7 +3381,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM [PROGRAM] filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "ON_ERROR") ||
Matches("COPY|\\copy", MatchAny, "FROM", "PROGRAM", MatchAny, "WITH", "(", "ON_ERROR"))
- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_null");
/* Complete COPY <sth> FROM [PROGRAM] filename WITH (LOG_VERBOSITY */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "LOG_VERBOSITY") ||
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..da3622028e7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -35,6 +35,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_SET_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..a606a38cc23 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,13 @@ typedef struct CopyFromStateData
* att */
bool *defaults; /* if DEFAULT marker was found for
* corresponding att */
+ /*
+ * Set to true if the corresponding attribute's data type is a domain with
+ * constraints. This field is usually NULL, except when ON_ERROR is set to
+ * SET_NULL.
+ */
+ bool *domain_with_constraint;
+
bool volatile_defexprs; /* is any of defexprs volatile? */
List *range_table; /* single element list of RangeTblEntry */
List *rteperminfos; /* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index f3fdce23459..16e70ccb813 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_null, on_error set_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error set_null);
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdout (on_error set_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -776,6 +788,46 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ERROR: domain d_int_not_null does not allow null values
+CONTEXT: COPY t_on_error_null, line 1, column a: null input
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ERROR: invalid input value for domain d_int_not_null: "ss"
+CONTEXT: COPY t_on_error_null, line 1, column a: "ss"
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ERROR: invalid input value for domain d_int_not_null: "-1"
+CONTEXT: COPY t_on_error_null, line 1, column a: "-1"
+--fail, less data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: missing data for column "c"
+CONTEXT: COPY t_on_error_null, line 1: "1,1"
+--fail, extra data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4"
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); --ok
+NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT: COPY t_on_error_null
+NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT: COPY t_on_error_null
+NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility
+SELECT * FROM t_on_error_null;
+ a | b | c
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL | 12
+ 13 | 14 | NULL
+(3 rows)
+
+\pset null ''
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -835,6 +887,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index cef45868db5..49200cf064d 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error set_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -537,6 +541,42 @@ a {2} 2
8 {8} 8
\.
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+\N 11 13
+\.
+
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ss 11 14
+\.
+
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+-1 11 13
+\.
+
+--fail, less data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail, extra data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); --ok
+10 x1 yx
+11 zx 12
+13 14 ea
+\.
+
+SELECT * FROM t_on_error_null;
+
+\pset null ''
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -606,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
DROP TABLE check_ign_err2;
DROP DOMAIN dcheck_ign_err2;
DROP TABLE hard_err;
--
2.34.1