Conflict handling for COPY FROM

surafel3000@gmail.com

over 7 years ago

In reply to: Karen Huddleston (#2)

Re: Conflict handling for COPY FROM

Thanks for looking at it
1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
By mistake I write it copy_max_error_limit here but in the patch it is copy_maximum_error_limit with a default value of 100 sorry for mismatch
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
I see it now that most of formatting error can be handle safely I will attache the patch for it this week
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
I agree.I will change it to ON CONFLICT LOG if we can’t find better naming
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.
okay

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Surafel Temesgen (#1)

Re: Conflict handling for COPY FROM

On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000@gmail.com>
wrote:

In order to prevent extreme condition the patch also add a new GUC
variable called copy_max_error_limit that control the amount of error to
swallow before start to error and new failed record file options for copy
to write a failed record so the user can examine it.

Why should this be a GUC rather than a COPY option?

In fact, try doing all of this by adding more options to COPY rather than
new syntax.

COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')

It kind of stinks to use a log file written by the server as the
dup-reporting channel though. That would have to be superuser-only.

...Robert
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

surafel3000@gmail.com

over 7 years ago

In reply to: Robert Haas (#4)

1 attachment(s)

Re: Conflict handling for COPY FROM

Hello,

The attached patch add error handling for
Extra data

missing data

invalid oid

null oid and

row count mismatch

And the record that field on the above case write to the file with appended
error message in it and in case of unique violation or exclusion constraint
violation error the failed record write as it is because the case of the
error can not be identified specifically

The new syntax became :

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

Regards

Surafel

Attachments:

conflict-handling-onCopy-from-v2.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v2.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..bf21abd8e0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -364,6 +364,17 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>IGNORE_CONFLICTS</literal></term>
+    <listitem>
+     <para>
+      specifies ignore to error up to specified amount .
+      Instead write the error record to failed record file and 
+      precede to the next record
+     </para>
+    </listitem>
+   </varlistentry>
+
   </variablelist>
  </refsect1>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9bc67ce60f..ffa6aecbd5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -118,6 +119,7 @@ typedef struct CopyStateData
 	int			file_encoding;	/* file or remote side's character encoding */
 	bool		need_transcoding;	/* file encoding diff from server? */
 	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
+	FILE	   *failed_rec_file;
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
@@ -147,6 +149,9 @@ typedef struct CopyStateData
 	bool		convert_selectively;	/* do selective binary conversion? */
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	char	   *failed_rec_filename;
+	bool	   ignore_conflict;
+	int	   error_limit;	/* total # of error to log */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -766,6 +771,21 @@ CopyLoadRawBuf(CopyState cstate)
 	return (inbytes > 0);
 }
 
+/*
+ * LogCopyError log error in to error log file
+ */
+static void
+LogCopyError(CopyState cstate, const char *str)
+{
+	appendBinaryStringInfo(&cstate->line_buf, str, strlen(str));
+#ifndef WIN32
+	appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+	appendBinaryStringInfo(&cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+	fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+	cstate->error_limit--;
+}
 
 /*
  *	 DoCopy executes the SQL COPY statement
@@ -1226,6 +1246,19 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "ignore_conflicts") == 0)
+		{
+			List       *conflictOption;
+			if (cstate->ignore_conflict)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->ignore_conflict = true;
+			conflictOption = (List *) defel->arg;
+			cstate->error_limit = intVal(list_nth(conflictOption, 0));
+			cstate->failed_rec_filename = strVal(list_nth(conflictOption, 1));
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1749,6 +1782,11 @@ EndCopy(CopyState cstate)
 					(errcode_for_file_access(),
 					 errmsg("could not close file \"%s\": %m",
 							cstate->filename)));
+		if (cstate->failed_rec_filename != NULL && FreeFile(cstate->failed_rec_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							cstate->failed_rec_filename)));
 	}
 
 	MemoryContextDelete(cstate->copycontext);
@@ -2461,6 +2499,8 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_FROZEN;
 	}
 
+	if (!cstate->ignore_conflict)
+		cstate->error_limit = 0;
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2579,6 +2619,10 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_conflict)
+	{
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -2968,12 +3012,59 @@ CopyFrom(CopyState cstate)
 						 */
 						tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if (cstate->ignore_conflict && cstate->error_limit > 0)
+					{
+						bool		specConflict;
+						uint32		specToken;
+						specConflict = false;
+
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+						HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
+
+						/* insert the tuple, with the speculative token */
+						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+									estate->es_output_cid,
+									HEAP_INSERT_SPECULATIVE,
+									NULL);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
+													   estate, true, &specConflict,
+													   NIL);
+
+						/* adjust the tuple's state accordingly */
+						if (!specConflict)
+						{
+							heap_finish_speculative(resultRelInfo->ri_RelationDesc, tuple);
+							processed++;
+						}
+						else
+						{
+							heap_abort_speculative(resultRelInfo->ri_RelationDesc, tuple);
+#ifndef WIN32
+							appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+							appendBinaryStringInfo(&cstate->cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+							fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+							cstate->error_limit--;
+
+						}
+
+				/*
+				 * Wake up anyone waiting for our decision.  They will re-check
+				 * the tuple, see that it's no longer speculative, and wait on our
+				 * XID as if this was a regularly inserted tuple all along.
+				 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+				}
 					else
 						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
 									mycid, hi_options, bistate);
 
 					/* And create index entries for it */
-					if (resultRelInfo->ri_NumIndices > 0)
+					if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0)
 						recheckIndexes = ExecInsertIndexTuples(slot,
 															   &(tuple->t_self),
 															   estate,
@@ -2994,7 +3085,8 @@ CopyFrom(CopyState cstate)
 			 * or FDW; this is the same definition used by nodeModifyTable.c
 			 * for counting tuples inserted by an INSERT command.
 			 */
-			processed++;
+				if(!cstate->ignore_conflict)
+				processed++;
 		}
 	}
 
@@ -3286,6 +3378,48 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->num_defaults = num_defaults;
 	cstate->is_program = is_program;
 
+	if (cstate->failed_rec_filename)
+	{
+		mode_t		oumask; /* Pre-existing umask value */
+		struct stat st;
+			/*
+			 * Prevent write to relative path ... too easy to shoot oneself in
+			 * the foot by overwriting a database file ...
+			 */
+			if (!is_absolute_path(cstate->failed_rec_filename))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_NAME),
+						 errmsg("relative path not allowed for failed record file")));
+			oumask = umask(S_IWGRP | S_IWOTH);
+			PG_TRY();
+			{
+				cstate->failed_rec_file = AllocateFile(cstate->failed_rec_filename, PG_BINARY_W);
+			}
+			PG_CATCH();
+			{
+				umask(oumask);
+				PG_RE_THROW();
+			}
+			PG_END_TRY();
+			umask(oumask);
+			if (cstate->failed_rec_file == NULL)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not open file \"%s\" for writing: %m",
+								cstate->failed_rec_filename)));
+
+			if (fstat(fileno(cstate->failed_rec_file), &st))
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not stat file \"%s\": %m",
+								cstate->failed_rec_filename)));
+
+			if (S_ISDIR(st.st_mode))
+				ereport(ERROR,
+						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+						 errmsg("\"%s\" is a directory", cstate->failed_rec_filename)));
+		}
+
 	if (data_source_cb)
 	{
 		cstate->copy_dest = COPY_CALLBACK;
@@ -3498,7 +3632,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3513,9 +3647,16 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (nfields > 0 && fldct > nfields)
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				LogCopyError(cstate, " extra data after last expected column");
+				goto next_line;
+			}else
 			ereport(ERROR,
 					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 					 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3523,15 +3664,29 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		if (file_has_oids)
 		{
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for OID column")));
+			{
+				if (cstate->ignore_conflict && cstate->error_limit > 0)
+				{
+					LogCopyError(cstate, " missing data for OID column");
+					goto next_line;
+				}else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for OID column")));
+			}
 			string = field_strings[fieldno++];
 
 			if (string == NULL)
+			{
+				if (cstate->ignore_conflict && cstate->error_limit > 0)
+				{
+					LogCopyError(cstate, " null OID in COPY data");
+					goto next_line;
+				}else
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 						 errmsg("null OID in COPY data")));
+			}
 			else if (cstate->oids && tupleOid != NULL)
 			{
 				cstate->cur_attname = "oid";
@@ -3539,9 +3694,17 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 				*tupleOid = DatumGetObjectId(DirectFunctionCall1(oidin,
 																 CStringGetDatum(string)));
 				if (*tupleOid == InvalidOid)
+				{
+					if (cstate->ignore_conflict && cstate->error_limit > 0)
+					{
+						LogCopyError(cstate, " invalid OID in COPY data");
+						goto next_line;
+					}else
+
 					ereport(ERROR,
 							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 							 errmsg("invalid OID in COPY data")));
+				}
 				cstate->cur_attname = NULL;
 				cstate->cur_attval = NULL;
 			}
@@ -3555,10 +3718,20 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
+			{
+				if (cstate->ignore_conflict && cstate->error_limit > 0)
+				{
+					appendStringInfo(&cstate->line_buf, " missing data for column %s",
+								NameStr(att->attname));
+					LogCopyError(cstate, " ");
+					goto next_line;
+				}else
+
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
 						 errmsg("missing data for column \"%s\"",
 								NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3645,10 +3818,19 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				appendStringInfo(&cstate->line_buf, "row field count is %d, expected %d",
+						(int) fld_count, attr_count);
+				LogCopyError(cstate, " ");
+				goto next_line;
+			}else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+		}
 
 		if (file_has_oids)
 		{
@@ -3663,9 +3845,16 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 														 -1,
 														 &isnull));
 			if (isnull || loaded_oid == InvalidOid)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("invalid OID in COPY data")));
+			{
+				if (cstate->ignore_conflict && cstate->error_limit > 0)
+				{
+					LogCopyError(cstate, " invalid OID in COPY data");
+					goto next_line;
+				}else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("invalid OID in COPY data")));
+			}
 			cstate->cur_attname = NULL;
 			if (cstate->oids && tupleOid != NULL)
 				*tupleOid = loaded_oid;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 87f5e95827..c1084f71bc 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -632,7 +632,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
 	EXTENSION EXTERNAL EXTRACT
 
-	FALSE_P FAMILY FETCH FILTER FIRST_P FLOAT_P FOLLOWING FOR
+	FALSE_P FAMILY FETCH FILE_P FILTER FIRST_P FLOAT_P FOLLOWING FOR
 	FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION FUNCTIONS
 
 	GENERATED GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING GROUPS
@@ -650,7 +650,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOG_P LOGGED
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -3107,6 +3107,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *)makeString($2), @1);
 				}
+			| ON CONFLICT LOG_P Iconst ',' LOG_P FILE_P NAME_P Sconst
+				{
+					$$ = makeDefElem("ignore_conflicts", (Node *)list_make2(makeInteger($4), makeString($9)), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
@@ -15086,6 +15090,7 @@ unreserved_keyword:
 			| EXTENSION
 			| EXTERNAL
 			| FAMILY
+			| FILE_P
 			| FILTER
 			| FIRST_P
 			| FOLLOWING
@@ -15134,6 +15139,7 @@ unreserved_keyword:
 			| LOCATION
 			| LOCK_P
 			| LOCKED
+			| LOG_P
 			| LOGGED
 			| MAPPING
 			| MATCH
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 23db40147b..442562b0fe 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -162,6 +162,7 @@ PG_KEYWORD("extract", EXTRACT, COL_NAME_KEYWORD)
 PG_KEYWORD("false", FALSE_P, RESERVED_KEYWORD)
 PG_KEYWORD("family", FAMILY, UNRESERVED_KEYWORD)
 PG_KEYWORD("fetch", FETCH, RESERVED_KEYWORD)
+PG_KEYWORD("file", FILE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("filter", FILTER, UNRESERVED_KEYWORD)
 PG_KEYWORD("first", FIRST_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("float", FLOAT_P, COL_NAME_KEYWORD)
@@ -242,6 +243,7 @@ PG_KEYWORD("localtimestamp", LOCALTIMESTAMP, RESERVED_KEYWORD)
 PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
+PG_KEYWORD("log", LOG_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)

Dmitry Dolgov

9erthalion6@gmail.com

about 7 years ago

In reply to: Surafel Temesgen (#5)

Re: Conflict handling for COPY FROM

On Thu, Aug 23, 2018 at 4:16 PM Surafel Temesgen <surafel3000@gmail.com> wrote:

The attached patch add error handling for
Extra data

missing data

invalid oid

null oid and

row count mismatch

Hi,

Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it? I'm moving it to the next CF as "Waiting on
Author". Also I would appreciate if someone from the reviewers (Karen
Huddleston ?) could post a full patch review.

Nasby, Jim

nasbyj@amazon.com

about 7 years ago

In reply to: Robert Haas (#4)

Re: Conflict handling for COPY FROM

On Aug 20, 2018, at 5:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000@gmail.com> wrote:
In order to prevent extreme condition the patch also add a new GUC variable called copy_max_error_limit that control the amount of error to swallow before start to error and new failed record file options for copy to write a failed record so the user can examine it.

Why should this be a GUC rather than a COPY option?

In fact, try doing all of this by adding more options to COPY rather than new syntax.

COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')

It kind of stinks to use a log file written by the server as the dup-reporting channel though. That would have to be superuser-only.

Perhaps a better option would be to allow the user to specify a name for a cursor, and have COPY do the moral equivalent of DECLARE name? Calling a function for each bad row would be another option.

surafel3000@gmail.com

about 7 years ago

In reply to: Dmitry Dolgov (#6)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Thu, Nov 29, 2018 at 3:15 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it?

Thank you for informing, attach is rebased patch against current master

Regards

Surafel

Attachments:

conflict-handling-onCopy-from-v3.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v3.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 411941ed31..33015451a5 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -353,6 +353,28 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>on_conflict_log</literal></term>
+    <listitem>
+     <para>
+      Specifies to log error record up to specified amount.
+      Instead write the record to log file and
+      precede to the next record
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>log_file_name</literal></term>
+    <listitem>
+     <para>
+      The path name of the log file.  It must be an absolute
+      path.  Windows users might need to use an <literal>E''</literal> string and
+      double any backslashes used in the path name.
+     </para>
+    </listitem>
+   </varlistentry>
+
   </variablelist>
  </refsect1>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4311e16007..b4b707c3f6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -44,6 +44,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -121,6 +122,7 @@ typedef struct CopyStateData
 	int			file_encoding;	/* file or remote side's character encoding */
 	bool		need_transcoding;	/* file encoding diff from server? */
 	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
+	FILE	   *failed_rec_file;		/* used if ignore_conflict is true */
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
@@ -149,6 +151,9 @@ typedef struct CopyStateData
 	bool		convert_selectively;	/* do selective binary conversion? */
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	char	   *failed_rec_filename;	/* failed record filename */
+	bool	   ignore_conflict;
+	int	   error_limit;			/* total # of error to log */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -769,6 +774,21 @@ CopyLoadRawBuf(CopyState cstate)
 	return (inbytes > 0);
 }
 
+/*
+ * LogCopyError log error in to failed record file
+ */
+static void
+LogCopyError(CopyState cstate, const char *str)
+{
+	appendBinaryStringInfo(&cstate->line_buf, str, strlen(str));
+#ifndef WIN32
+	appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+	appendBinaryStringInfo(&cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+	fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+	cstate->error_limit--;
+}
 
 /*
  *	 DoCopy executes the SQL COPY statement
@@ -1223,6 +1243,32 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "on_conflict_log") == 0)
+		{
+			if (cstate->ignore_conflict)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+
+			cstate->ignore_conflict = true;
+			cstate->error_limit =defGetInt64(defel);
+			if (cstate->error_limit < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("argument to option \"%s\" must be positive number",
+								defel->defname),
+						 parser_errposition(pstate, defel->location)));
+		}
+		else if (strcmp(defel->defname, "log_file_name") == 0)
+		{
+			if (cstate->failed_rec_filename)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->failed_rec_filename =defGetString(defel);
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1245,6 +1291,21 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify NULL in BINARY mode")));
 
+	if (!cstate->error_limit && cstate->failed_rec_filename)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify log file name without on conflict log option")));
+
+	if (cstate->error_limit && !cstate->failed_rec_filename)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify on conflict log without log file name option")));
+
+	if (cstate->error_limit && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify on conflict log on COPY TO")));
+
 	/* Set defaults for omitted options */
 	if (!cstate->delim)
 		cstate->delim = cstate->csv_mode ? "," : "\t";
@@ -1745,6 +1806,11 @@ EndCopy(CopyState cstate)
 					(errcode_for_file_access(),
 					 errmsg("could not close file \"%s\": %m",
 							cstate->filename)));
+		if (cstate->failed_rec_filename != NULL && FreeFile(cstate->failed_rec_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							cstate->failed_rec_filename)));
 	}
 
 	MemoryContextDelete(cstate->copycontext);
@@ -2461,6 +2527,8 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_FROZEN;
 	}
 
+	if (!cstate->ignore_conflict)
+		cstate->error_limit = 0;
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2575,6 +2643,10 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_conflict)
+	{
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -2946,12 +3018,59 @@ CopyFrom(CopyState cstate)
 						 */
 						tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if (cstate->ignore_conflict && cstate->error_limit > 0)
+					{
+						bool		specConflict;
+						uint32		specToken;
+						specConflict = false;
+
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+						HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
+
+						/* insert the tuple, with the speculative token */
+						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+									estate->es_output_cid,
+									HEAP_INSERT_SPECULATIVE,
+									NULL);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
+													   estate, true, &specConflict,
+													   NIL);
+
+						/* adjust the tuple's state accordingly */
+						if (!specConflict)
+						{
+							heap_finish_speculative(resultRelInfo->ri_RelationDesc, tuple);
+							processed++;
+						}
+						else
+						{
+							heap_abort_speculative(resultRelInfo->ri_RelationDesc, tuple);
+#ifndef WIN32
+							appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+							appendBinaryStringInfo(&cstate->cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+							fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+							cstate->error_limit--;
+
+						}
+
+						/*
+						 * Wake up anyone waiting for our decision.  They will re-check
+						 * the tuple, see that it's no longer speculative, and wait on our
+						 * XID as if this was a regularly inserted tuple all along.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+					}
 					else
 						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
 									mycid, hi_options, bistate);
 
 					/* And create index entries for it */
-					if (resultRelInfo->ri_NumIndices > 0)
+					if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0)
 						recheckIndexes = ExecInsertIndexTuples(slot,
 															   &(tuple->t_self),
 															   estate,
@@ -2972,7 +3091,8 @@ CopyFrom(CopyState cstate)
 			 * or FDW; this is the same definition used by nodeModifyTable.c
 			 * for counting tuples inserted by an INSERT command.
 			 */
-			processed++;
+			if(!cstate->ignore_conflict)
+				processed++;
 		}
 	}
 
@@ -3260,6 +3380,48 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->num_defaults = num_defaults;
 	cstate->is_program = is_program;
 
+	if (cstate->failed_rec_filename)
+	{
+		mode_t		oumask; /* Pre-existing umask value */
+		struct stat st;
+			/*
+			 * Prevent write to relative path ... too easy to shoot oneself in
+			 * the foot by overwriting a database file ...
+			 */
+			if (!is_absolute_path(cstate->failed_rec_filename))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_NAME),
+						 errmsg("relative path not allowed for failed record file")));
+			oumask = umask(S_IWGRP | S_IWOTH);
+			PG_TRY();
+			{
+				cstate->failed_rec_file = AllocateFile(cstate->failed_rec_filename, PG_BINARY_W);
+			}
+			PG_CATCH();
+			{
+				umask(oumask);
+				PG_RE_THROW();
+			}
+			PG_END_TRY();
+			umask(oumask);
+			if (cstate->failed_rec_file == NULL)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not open file \"%s\" for writing: %m",
+								cstate->failed_rec_filename)));
+
+			if (fstat(fileno(cstate->failed_rec_file), &st))
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not stat file \"%s\": %m",
+								cstate->failed_rec_filename)));
+
+			if (S_ISDIR(st.st_mode))
+				ereport(ERROR,
+						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+						 errmsg("\"%s\" is a directory", cstate->failed_rec_filename)));
+		}
+
 	if (data_source_cb)
 	{
 		cstate->copy_dest = COPY_CALLBACK;
@@ -3458,7 +3620,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3473,9 +3635,16 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				LogCopyError(cstate, " extra data after last expected column");
+				goto next_line;
+			}else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3487,10 +3656,20 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->ignore_conflict && cstate->error_limit > 0)
+				{
+					appendStringInfo(&cstate->line_buf, " missing data for column %s",
+								NameStr(att->attname));
+					LogCopyError(cstate, " ");
+					goto next_line;
+				}else
+
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3577,10 +3756,19 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				appendStringInfo(&cstate->line_buf, "row field count is %d, expected %d",
+						(int) fld_count, attr_count);
+				LogCopyError(cstate, " ");
+				goto next_line;
+			}else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 2c2208ffb7..ecfa5f9874 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -632,7 +632,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
 	EXTENSION EXTERNAL EXTRACT
 
-	FALSE_P FAMILY FETCH FILTER FIRST_P FLOAT_P FOLLOWING FOR
+	FALSE_P FAMILY FETCH FILE_P FILTER FIRST_P FLOAT_P FOLLOWING FOR
 	FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION FUNCTIONS
 
 	GENERATED GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING GROUPS
@@ -650,7 +650,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOG_P LOGGED
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -3093,6 +3093,14 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *)makeString($2), @1);
 				}
+			| ON CONFLICT LOG_P Iconst
+				{
+					$$ = makeDefElem("on_conflict_log", (Node *)makeInteger($4), @1);
+				}
+			| LOG_P FILE_P NAME_P Sconst
+				{
+					$$ = makeDefElem("log_file_name", (Node *)makeString($4), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
@@ -15052,6 +15060,7 @@ unreserved_keyword:
 			| EXTENSION
 			| EXTERNAL
 			| FAMILY
+			| FILE_P
 			| FILTER
 			| FIRST_P
 			| FOLLOWING
@@ -15100,6 +15109,7 @@ unreserved_keyword:
 			| LOCATION
 			| LOCK_P
 			| LOCKED
+			| LOG_P
 			| LOGGED
 			| MAPPING
 			| MATCH
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 23db40147b..442562b0fe 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -162,6 +162,7 @@ PG_KEYWORD("extract", EXTRACT, COL_NAME_KEYWORD)
 PG_KEYWORD("false", FALSE_P, RESERVED_KEYWORD)
 PG_KEYWORD("family", FAMILY, UNRESERVED_KEYWORD)
 PG_KEYWORD("fetch", FETCH, RESERVED_KEYWORD)
+PG_KEYWORD("file", FILE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("filter", FILTER, UNRESERVED_KEYWORD)
 PG_KEYWORD("first", FIRST_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("float", FLOAT_P, COL_NAME_KEYWORD)
@@ -242,6 +243,7 @@ PG_KEYWORD("localtimestamp", LOCALTIMESTAMP, RESERVED_KEYWORD)
 PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
+PG_KEYWORD("log", LOG_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Surafel Temesgen (#8)

Re: Conflict handling for COPY FROM

On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:

Thank you for informing, attach is rebased patch against current
master

copy.c conflicts on HEAD, please rebase. I am moving the patch to
next CF, waiting on author.
--
Michael

#10

andres@anarazel.de

almost 7 years ago

In reply to: Surafel Temesgen (#5)

Re: Conflict handling for COPY FROM

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

- Andres

#11

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 7 years ago

In reply to: Andres Freund (#10)

Re: Conflict handling for COPY FROM

On 2/16/19 12:24 AM, Andres Freund wrote:

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

I liked Jim Nasby's idea of having it call a function rather than
writing to a log file.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

surafel3000@gmail.com

almost 7 years ago

In reply to: Michael Paquier (#9)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Mon, Feb 4, 2019 at 9:06 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:

Thank you for informing, attach is rebased patch against current
master

copy.c conflicts on HEAD, please rebase. I am moving the patch to
next CF, waiting on author.
--

Thank you, here is a rebased patch against current master

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v4.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v4.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 254d3ab8eb..5ee70d62bf 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -380,6 +380,28 @@ WHERE <replaceable class="parameter">condition</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>on_conflict_log</literal></term>
+    <listitem>
+     <para>
+      Specifies to log error record up to specified amount.
+      Instead write the record to log file and
+      precede to the next record
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>log_file_name</literal></term>
+    <listitem>
+     <para>
+      The path name of the log file.  It must be an absolute
+      path.  Windows users might need to use an <literal>E''</literal> string and
+      double any backslashes used in the path name.
+     </para>
+    </listitem>
+   </varlistentry>
+
   </variablelist>
  </refsect1>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index dbb06397e6..3c6afec5b3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -46,6 +46,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -123,6 +124,7 @@ typedef struct CopyStateData
 	int			file_encoding;	/* file or remote side's character encoding */
 	bool		need_transcoding;	/* file encoding diff from server? */
 	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
+	FILE	   *failed_rec_file;		/* used if ignore_conflict is true */
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
@@ -152,6 +154,9 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	char	   *failed_rec_filename;	/* failed record filename */
+	bool	   ignore_conflict;
+	int	   error_limit;			/* total # of error to log */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -773,6 +778,21 @@ CopyLoadRawBuf(CopyState cstate)
 	return (inbytes > 0);
 }
 
+/*
+ * LogCopyError log error in to failed record file
+ */
+static void
+LogCopyError(CopyState cstate, const char *str)
+{
+	appendBinaryStringInfo(&cstate->line_buf, str, strlen(str));
+#ifndef WIN32
+	appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+	appendBinaryStringInfo(&cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+	fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+	cstate->error_limit--;
+}
 
 /*
  *	 DoCopy executes the SQL COPY statement
@@ -1249,6 +1269,32 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "on_conflict_log") == 0)
+		{
+			if (cstate->ignore_conflict)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+
+			cstate->ignore_conflict = true;
+			cstate->error_limit =defGetInt64(defel);
+			if (cstate->error_limit < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("argument to option \"%s\" must be positive number",
+								defel->defname),
+						 parser_errposition(pstate, defel->location)));
+		}
+		else if (strcmp(defel->defname, "log_file_name") == 0)
+		{
+			if (cstate->failed_rec_filename)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->failed_rec_filename =defGetString(defel);
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1271,6 +1317,21 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify NULL in BINARY mode")));
 
+	if (!cstate->error_limit && cstate->failed_rec_filename)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify log file name without on conflict log option")));
+
+	if (cstate->error_limit && !cstate->failed_rec_filename)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify on conflict log without log file name option")));
+
+	if (cstate->error_limit && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify on conflict log on COPY TO")));
+
 	/* Set defaults for omitted options */
 	if (!cstate->delim)
 		cstate->delim = cstate->csv_mode ? "," : "\t";
@@ -1771,6 +1832,11 @@ EndCopy(CopyState cstate)
 					(errcode_for_file_access(),
 					 errmsg("could not close file \"%s\": %m",
 							cstate->filename)));
+		if (cstate->failed_rec_filename != NULL && FreeFile(cstate->failed_rec_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							cstate->failed_rec_filename)));
 	}
 
 	MemoryContextDelete(cstate->copycontext);
@@ -2492,6 +2558,8 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_FROZEN;
 	}
 
+	if (!cstate->ignore_conflict)
+		cstate->error_limit = 0;
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2619,6 +2687,10 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_conflict)
+	{
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3000,12 +3072,59 @@ CopyFrom(CopyState cstate)
 						 */
 						tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if (cstate->ignore_conflict && cstate->error_limit > 0)
+					{
+						bool		specConflict;
+						uint32		specToken;
+						specConflict = false;
+
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+						HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
+
+						/* insert the tuple, with the speculative token */
+						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+									estate->es_output_cid,
+									HEAP_INSERT_SPECULATIVE,
+									NULL);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
+													   estate, true, &specConflict,
+													   NIL);
+
+						/* adjust the tuple's state accordingly */
+						if (!specConflict)
+						{
+							heap_finish_speculative(resultRelInfo->ri_RelationDesc, tuple);
+							processed++;
+						}
+						else
+						{
+							heap_abort_speculative(resultRelInfo->ri_RelationDesc, tuple);
+#ifndef WIN32
+							appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+							appendBinaryStringInfo(&cstate->cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+							fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+							cstate->error_limit--;
+
+						}
+
+						/*
+						 * Wake up anyone waiting for our decision.  They will re-check
+						 * the tuple, see that it's no longer speculative, and wait on our
+						 * XID as if this was a regularly inserted tuple all along.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+					}
 					else
 						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
 									mycid, hi_options, bistate);
 
 					/* And create index entries for it */
-					if (resultRelInfo->ri_NumIndices > 0)
+					if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0)
 						recheckIndexes = ExecInsertIndexTuples(slot,
 															   &(tuple->t_self),
 															   estate,
@@ -3026,7 +3145,8 @@ CopyFrom(CopyState cstate)
 			 * or FDW; this is the same definition used by nodeModifyTable.c
 			 * for counting tuples inserted by an INSERT command.
 			 */
-			processed++;
+			if(!cstate->ignore_conflict)
+				processed++;
 		}
 	}
 
@@ -3316,6 +3436,48 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->num_defaults = num_defaults;
 	cstate->is_program = is_program;
 
+	if (cstate->failed_rec_filename)
+	{
+		mode_t		oumask; /* Pre-existing umask value */
+		struct stat st;
+			/*
+			 * Prevent write to relative path ... too easy to shoot oneself in
+			 * the foot by overwriting a database file ...
+			 */
+			if (!is_absolute_path(cstate->failed_rec_filename))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_NAME),
+						 errmsg("relative path not allowed for failed record file")));
+			oumask = umask(S_IWGRP | S_IWOTH);
+			PG_TRY();
+			{
+				cstate->failed_rec_file = AllocateFile(cstate->failed_rec_filename, PG_BINARY_W);
+			}
+			PG_CATCH();
+			{
+				umask(oumask);
+				PG_RE_THROW();
+			}
+			PG_END_TRY();
+			umask(oumask);
+			if (cstate->failed_rec_file == NULL)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not open file \"%s\" for writing: %m",
+								cstate->failed_rec_filename)));
+
+			if (fstat(fileno(cstate->failed_rec_file), &st))
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not stat file \"%s\": %m",
+								cstate->failed_rec_filename)));
+
+			if (S_ISDIR(st.st_mode))
+				ereport(ERROR,
+						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+						 errmsg("\"%s\" is a directory", cstate->failed_rec_filename)));
+		}
+
 	if (data_source_cb)
 	{
 		cstate->copy_dest = COPY_CALLBACK;
@@ -3514,7 +3676,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3529,9 +3691,16 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				LogCopyError(cstate, " extra data after last expected column");
+				goto next_line;
+			}else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3543,10 +3712,20 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->ignore_conflict && cstate->error_limit > 0)
+				{
+					appendStringInfo(&cstate->line_buf, " missing data for column %s",
+								NameStr(att->attname));
+					LogCopyError(cstate, " ");
+					goto next_line;
+				}else
+
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3633,10 +3812,19 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				appendStringInfo(&cstate->line_buf, "row field count is %d, expected %d",
+						(int) fld_count, attr_count);
+				LogCopyError(cstate, " ");
+				goto next_line;
+			}else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a68f78e0e0..74d5737d7a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -631,7 +631,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
 	EXTENSION EXTERNAL EXTRACT
 
-	FALSE_P FAMILY FETCH FILTER FIRST_P FLOAT_P FOLLOWING FOR
+	FALSE_P FAMILY FETCH FILE_P FILTER FIRST_P FLOAT_P FOLLOWING FOR
 	FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION FUNCTIONS
 
 	GENERATED GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING GROUPS
@@ -649,7 +649,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOG_P LOGGED
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -3047,6 +3047,14 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *)makeString($2), @1);
 				}
+			| ON CONFLICT LOG_P Iconst
+				{
+					$$ = makeDefElem("on_conflict_log", (Node *)makeInteger($4), @1);
+				}
+			| LOG_P FILE_P NAME_P Sconst
+				{
+					$$ = makeDefElem("log_file_name", (Node *)makeString($4), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
@@ -15033,6 +15041,7 @@ unreserved_keyword:
 			| EXTENSION
 			| EXTERNAL
 			| FAMILY
+			| FILE_P
 			| FILTER
 			| FIRST_P
 			| FOLLOWING
@@ -15081,6 +15090,7 @@ unreserved_keyword:
 			| LOCATION
 			| LOCK_P
 			| LOCKED
+			| LOG_P
 			| LOGGED
 			| MAPPING
 			| MATCH
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f05444008c..c161b4cd7a 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -161,6 +161,7 @@ PG_KEYWORD("extract", EXTRACT, COL_NAME_KEYWORD)
 PG_KEYWORD("false", FALSE_P, RESERVED_KEYWORD)
 PG_KEYWORD("family", FAMILY, UNRESERVED_KEYWORD)
 PG_KEYWORD("fetch", FETCH, RESERVED_KEYWORD)
+PG_KEYWORD("file", FILE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("filter", FILTER, UNRESERVED_KEYWORD)
 PG_KEYWORD("first", FIRST_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("float", FLOAT_P, COL_NAME_KEYWORD)
@@ -241,6 +242,7 @@ PG_KEYWORD("localtimestamp", LOCALTIMESTAMP, RESERVED_KEYWORD)
 PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
+PG_KEYWORD("log", LOG_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)

#13

surafel3000@gmail.com

almost 7 years ago

In reply to: Andres Freund (#10)

Re: Conflict handling for COPY FROM

On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

- Andres

I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.

regards
Surafel

#14

andres@anarazel.de

almost 7 years ago

In reply to: Surafel Temesgen (#13)

Re: Conflict handling for COPY FROM

On February 19, 2019 3:05:37 AM PST, Surafel Temesgen <surafel3000@gmail.com> wrote:

On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres@anarazel.de>
wrote:

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires

to

be super user only, which seems to restrict the feature more than
necessary?

- Andres

I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.

Err, what? Again, that requires super user permissions (in contrast to copy from/to stdin/out). Backends run as the user postgres runs under - it will always have write permissions to at least the entire data directory. I think not addressing this just about guarantees the feature will be rejected.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#15

surafel3000@gmail.com

almost 7 years ago

In reply to: Andres Freund (#14)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:

Err, what? Again, that requires super user permissions (in contrast to
copy from/to stdin/out). Backends run as the user postgres runs under

okay i see it now and modified the patch similarly

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v5.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v5.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 254d3ab8eb..5ee70d62bf 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -380,6 +380,28 @@ WHERE <replaceable class="parameter">condition</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>on_conflict_log</literal></term>
+    <listitem>
+     <para>
+      Specifies to log error record up to specified amount.
+      Instead write the record to log file and
+      precede to the next record
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>log_file_name</literal></term>
+    <listitem>
+     <para>
+      The path name of the log file.  It must be an absolute
+      path.  Windows users might need to use an <literal>E''</literal> string and
+      double any backslashes used in the path name.
+     </para>
+    </listitem>
+   </varlistentry>
+
   </variablelist>
  </refsect1>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index dbb06397e6..2a2c3d98b4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -46,6 +46,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -123,6 +124,7 @@ typedef struct CopyStateData
 	int			file_encoding;	/* file or remote side's character encoding */
 	bool		need_transcoding;	/* file encoding diff from server? */
 	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
+	FILE	   *failed_rec_file;		/* used if ignore_conflict is true */
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
@@ -152,6 +154,9 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	char	   *failed_rec_filename;	/* failed record filename */
+	bool	   ignore_conflict;
+	int	   error_limit;			/* total # of error to log */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -773,6 +778,21 @@ CopyLoadRawBuf(CopyState cstate)
 	return (inbytes > 0);
 }
 
+/*
+ * LogCopyError log error in to failed record file
+ */
+static void
+LogCopyError(CopyState cstate, const char *str)
+{
+	appendBinaryStringInfo(&cstate->line_buf, str, strlen(str));
+#ifndef WIN32
+	appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+	appendBinaryStringInfo(&cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+	fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+	cstate->error_limit--;
+}
 
 /*
  *	 DoCopy executes the SQL COPY statement
@@ -836,6 +856,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 						 errmsg("must be superuser or a member of the pg_write_server_files role to COPY to a file"),
 						 errhint("Anyone can COPY to stdout or from stdin. "
 								 "psql's \\copy command also works for anyone.")));
+
 		}
 	}
 
@@ -1249,6 +1270,36 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "on_conflict_log") == 0)
+		{
+			if (cstate->ignore_conflict)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+
+			cstate->ignore_conflict = true;
+			cstate->error_limit =defGetInt64(defel);
+			if (cstate->error_limit < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("argument to option \"%s\" must be positive number",
+								defel->defname),
+						 parser_errposition(pstate, defel->location)));
+		}
+		else if (strcmp(defel->defname, "log_file_name") == 0)
+		{
+			if (cstate->failed_rec_filename)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_WRITE_SERVER_FILES))
+				ereport(ERROR,
+						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+						 errmsg("must be superuser or a member of the pg_write_server_files role to log error")));
+			cstate->failed_rec_filename =defGetString(defel);
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1271,6 +1322,21 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify NULL in BINARY mode")));
 
+	if (!cstate->error_limit && cstate->failed_rec_filename)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify log file name without on conflict log option")));
+
+	if (cstate->error_limit && !cstate->failed_rec_filename)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify on conflict log without log file name option")));
+
+	if (cstate->error_limit && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify on conflict log on COPY TO")));
+
 	/* Set defaults for omitted options */
 	if (!cstate->delim)
 		cstate->delim = cstate->csv_mode ? "," : "\t";
@@ -1771,6 +1837,11 @@ EndCopy(CopyState cstate)
 					(errcode_for_file_access(),
 					 errmsg("could not close file \"%s\": %m",
 							cstate->filename)));
+		if (cstate->failed_rec_filename != NULL && FreeFile(cstate->failed_rec_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							cstate->failed_rec_filename)));
 	}
 
 	MemoryContextDelete(cstate->copycontext);
@@ -2492,6 +2563,8 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_FROZEN;
 	}
 
+	if (!cstate->ignore_conflict)
+		cstate->error_limit = 0;
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2619,6 +2692,10 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_conflict)
+	{
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3000,12 +3077,59 @@ CopyFrom(CopyState cstate)
 						 */
 						tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if (cstate->ignore_conflict && cstate->error_limit > 0)
+					{
+						bool		specConflict;
+						uint32		specToken;
+						specConflict = false;
+
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+						HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
+
+						/* insert the tuple, with the speculative token */
+						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+									estate->es_output_cid,
+									HEAP_INSERT_SPECULATIVE,
+									NULL);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
+													   estate, true, &specConflict,
+													   NIL);
+
+						/* adjust the tuple's state accordingly */
+						if (!specConflict)
+						{
+							heap_finish_speculative(resultRelInfo->ri_RelationDesc, tuple);
+							processed++;
+						}
+						else
+						{
+							heap_abort_speculative(resultRelInfo->ri_RelationDesc, tuple);
+#ifndef WIN32
+							appendStringInfoCharMacro(&cstate->line_buf, '\n');
+#else
+							appendBinaryStringInfo(&cstate->cstate->line_buf, "\r\n", strlen("\r\n"));
+#endif
+							fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+							cstate->error_limit--;
+
+						}
+
+						/*
+						 * Wake up anyone waiting for our decision.  They will re-check
+						 * the tuple, see that it's no longer speculative, and wait on our
+						 * XID as if this was a regularly inserted tuple all along.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+					}
 					else
 						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
 									mycid, hi_options, bistate);
 
 					/* And create index entries for it */
-					if (resultRelInfo->ri_NumIndices > 0)
+					if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0)
 						recheckIndexes = ExecInsertIndexTuples(slot,
 															   &(tuple->t_self),
 															   estate,
@@ -3026,7 +3150,8 @@ CopyFrom(CopyState cstate)
 			 * or FDW; this is the same definition used by nodeModifyTable.c
 			 * for counting tuples inserted by an INSERT command.
 			 */
-			processed++;
+			if(!cstate->ignore_conflict)
+				processed++;
 		}
 	}
 
@@ -3316,6 +3441,48 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->num_defaults = num_defaults;
 	cstate->is_program = is_program;
 
+	if (cstate->failed_rec_filename)
+	{
+		mode_t		oumask; /* Pre-existing umask value */
+		struct stat st;
+			/*
+			 * Prevent write to relative path ... too easy to shoot oneself in
+			 * the foot by overwriting a database file ...
+			 */
+			if (!is_absolute_path(cstate->failed_rec_filename))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_NAME),
+						 errmsg("relative path not allowed for failed record file")));
+			oumask = umask(S_IWGRP | S_IWOTH);
+			PG_TRY();
+			{
+				cstate->failed_rec_file = AllocateFile(cstate->failed_rec_filename, PG_BINARY_W);
+			}
+			PG_CATCH();
+			{
+				umask(oumask);
+				PG_RE_THROW();
+			}
+			PG_END_TRY();
+			umask(oumask);
+			if (cstate->failed_rec_file == NULL)
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not open file \"%s\" for writing: %m",
+								cstate->failed_rec_filename)));
+
+			if (fstat(fileno(cstate->failed_rec_file), &st))
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not stat file \"%s\": %m",
+								cstate->failed_rec_filename)));
+
+			if (S_ISDIR(st.st_mode))
+				ereport(ERROR,
+						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+						 errmsg("\"%s\" is a directory", cstate->failed_rec_filename)));
+		}
+
 	if (data_source_cb)
 	{
 		cstate->copy_dest = COPY_CALLBACK;
@@ -3514,7 +3681,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3529,9 +3696,16 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				LogCopyError(cstate, " extra data after last expected column");
+				goto next_line;
+			}else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3543,10 +3717,20 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->ignore_conflict && cstate->error_limit > 0)
+				{
+					appendStringInfo(&cstate->line_buf, " missing data for column %s",
+								NameStr(att->attname));
+					LogCopyError(cstate, " ");
+					goto next_line;
+				}else
+
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3633,10 +3817,19 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->ignore_conflict && cstate->error_limit > 0)
+			{
+				appendStringInfo(&cstate->line_buf, "row field count is %d, expected %d",
+						(int) fld_count, attr_count);
+				LogCopyError(cstate, " ");
+				goto next_line;
+			}else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c1faf4152c..bf21ba408e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -631,7 +631,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
 	EXTENSION EXTERNAL EXTRACT
 
-	FALSE_P FAMILY FETCH FILTER FIRST_P FLOAT_P FOLLOWING FOR
+	FALSE_P FAMILY FETCH FILE_P FILTER FIRST_P FLOAT_P FOLLOWING FOR
 	FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION FUNCTIONS
 
 	GENERATED GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING GROUPS
@@ -649,7 +649,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	LABEL LANGUAGE LARGE_P LAST_P LATERAL_P
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
-	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
+	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOG_P LOGGED
 
 	MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE
 
@@ -3047,6 +3047,14 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *)makeString($2), @1);
 				}
+			| ON CONFLICT LOG_P Iconst
+				{
+					$$ = makeDefElem("on_conflict_log", (Node *)makeInteger($4), @1);
+				}
+			| LOG_P FILE_P NAME_P Sconst
+				{
+					$$ = makeDefElem("log_file_name", (Node *)makeString($4), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
@@ -15004,6 +15012,7 @@ unreserved_keyword:
 			| EXTENSION
 			| EXTERNAL
 			| FAMILY
+			| FILE_P
 			| FILTER
 			| FIRST_P
 			| FOLLOWING
@@ -15052,6 +15061,7 @@ unreserved_keyword:
 			| LOCATION
 			| LOCK_P
 			| LOCKED
+			| LOG_P
 			| LOGGED
 			| MAPPING
 			| MATCH
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index adeb834ce8..3b20f6d16a 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -161,6 +161,7 @@ PG_KEYWORD("extract", EXTRACT, COL_NAME_KEYWORD)
 PG_KEYWORD("false", FALSE_P, RESERVED_KEYWORD)
 PG_KEYWORD("family", FAMILY, UNRESERVED_KEYWORD)
 PG_KEYWORD("fetch", FETCH, RESERVED_KEYWORD)
+PG_KEYWORD("file", FILE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("filter", FILTER, UNRESERVED_KEYWORD)
 PG_KEYWORD("first", FIRST_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("float", FLOAT_P, COL_NAME_KEYWORD)
@@ -241,6 +242,7 @@ PG_KEYWORD("localtimestamp", LOCALTIMESTAMP, RESERVED_KEYWORD)
 PG_KEYWORD("location", LOCATION, UNRESERVED_KEYWORD)
 PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("locked", LOCKED, UNRESERVED_KEYWORD)
+PG_KEYWORD("log", LOG_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("logged", LOGGED, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)

#16

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

almost 7 years ago

In reply to: Surafel Temesgen (#15)

Re: Conflict handling for COPY FROM

On 2/20/19 8:01 AM, Surafel Temesgen wrote:

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under

okay i see it now and modified the patch similarly

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17

andres@anarazel.de

almost 7 years ago

In reply to: Andrew Dunstan (#16)

Re: Conflict handling for COPY FROM

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On 2/20/19 8:01 AM, Surafel Temesgen wrote:

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under

okay i see it now and modified the patch similarly

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#18

David Steele

david@pgmasters.net

almost 7 years ago

In reply to: Andres Freund (#17)

Re: Re: Conflict handling for COPY FROM

Hi Surafel,

On 2/20/19 8:03 PM, Andres Freund wrote:

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.

This patch no longer applies so marked Waiting on Author.

Also, it appears that you have some comments from Andrew and Andres that
you should reply to.

Regards,
--
-David
david@pgmasters.net

#19

andres@anarazel.de

almost 7 years ago

In reply to: David Steele (#18)

Re: Conflict handling for COPY FROM

Hi,

On 2019-03-25 12:50:13 +0400, David Steele wrote:

On 2/20/19 8:03 PM, Andres Freund wrote:

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.

This patch no longer applies so marked Waiting on Author.

Also, it appears that you have some comments from Andrew and Andres that you
should reply to.

As nothing has happened the last weeks, I've now marked this as
returned with feedback.

- Andres

#20

surafel3000@gmail.com

over 6 years ago

In reply to: Andres Freund (#17)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <
andrew.dunstan@2ndquadrant.com> wrote:

On 2/20/19 8:01 AM, Surafel Temesgen wrote:

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under

okay i see it now and modified the patch similarly

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported
these days.

hello,
i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

In addition to the above change in the attached patch i also change
the syntax to ERROR LIMIT because it is no longer only skip
unique and exclusion constrain violation
regards
Surafel

Attachments:

conflict-handling-onCopy-from-v6.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v6.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5e2992ddac..dc3b943279 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Specifies to ignore error record up to <replaceable
+      class="parameter">limit_number</replaceable> number.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <para>
+    Currently, only unique or exclusion constraint violation
+    and same record formatting error is ignored.
+   </para>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f1161f0fee..05a5f29d4c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -48,6 +48,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -154,6 +155,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -1291,6 +1293,21 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->error_limit > 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			if (cstate->error_limit <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("argument to option \"%s\" must be positive integer",
+								defel->defname),
+						 parser_errposition(pstate, defel->location)));
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1441,6 +1458,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->error_limit && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2837,7 +2858,10 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->error_limit)
+		ExecOpenIndices(resultRelInfo, true);
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2942,6 +2966,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->error_limit)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3285,13 +3316,79 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if (cstate->error_limit && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							ereport(WARNING,
+									(errcode(ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION),
+									 errmsg("skipping \"%s\" --- violate exclusion or unique constraint",
+											cstate->line_buf.data)));
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock".
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, warn about it and preceded
+						 * to the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							ereport(WARNING,
+									(errcode(ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION),
+									 errmsg("skipping \"%s\" --- violate exclusion or unique constraint",
+											cstate->line_buf.data)));
+							cstate->error_limit--;
+							continue;
+						}
+						else
+							processed++;
+
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
 						table_tuple_insert(resultRelInfo->ri_RelationDesc,
 										   myslot, mycid, ti_options, bistate);
 
-						if (resultRelInfo->ri_NumIndices > 0)
+						if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0)
 							recheckIndexes = ExecInsertIndexTuples(myslot,
 																   estate,
 																   false,
@@ -3312,7 +3409,8 @@ CopyFrom(CopyState cstate)
 			 * or FDW; this is the same definition used by nodeModifyTable.c
 			 * for counting tuples inserted by an INSERT command.
 			 */
-			processed++;
+			if (!cstate->error_limit)
+				processed++;
 		}
 	}
 
@@ -3703,7 +3801,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3816,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column ",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3842,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\" ",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\" ",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3944,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d  ",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8311b1dd46..35fde206a5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -633,7 +633,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
-	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EVENT EXCEPT
+	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ERROR_P ESCAPE EVENT EXCEPT
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
 	EXTENSION EXTERNAL EXTRACT
 
@@ -3054,6 +3054,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *)makeString($2), @1);
 				}
+			| ERROR_P LIMIT Iconst
+				{
+					$$ = makeDefElem("error_limit", (Node *)makeInteger($3), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
@@ -15094,6 +15098,7 @@ unreserved_keyword:
 			| ENCODING
 			| ENCRYPTED
 			| ENUM_P
+			| ERROR_P
 			| ESCAPE
 			| EVENT
 			| EXCLUDE
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 00ace8425e..1f4f154d19 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -146,6 +146,7 @@ PG_KEYWORD("encoding", ENCODING, UNRESERVED_KEYWORD)
 PG_KEYWORD("encrypted", ENCRYPTED, UNRESERVED_KEYWORD)
 PG_KEYWORD("end", END_P, RESERVED_KEYWORD)
 PG_KEYWORD("enum", ENUM_P, UNRESERVED_KEYWORD)
+PG_KEYWORD("error", ERROR_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("escape", ESCAPE, UNRESERVED_KEYWORD)
 PG_KEYWORD("event", EVENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("except", EXCEPT, RESERVED_KEYWORD)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c53ed3ebf5..5421cbac4c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -36,10 +36,10 @@ COPY x from stdin;
 ERROR:  invalid input syntax for type integer: ""
 CONTEXT:  COPY x, line 1, column a: ""
 COPY x from stdin;
-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e" 
 CONTEXT:  COPY x, line 1: "2000	230	23	23"
 COPY x from stdin;
-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e" 
 CONTEXT:  COPY x, line 1: "2001	231	\N	\N"
 -- extra data: should fail
 COPY x from stdin;
@@ -55,6 +55,10 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin ERROR LIMIT 5;
+WARNING:  skipping "70001	22	32" --- missing data for column "d" 
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column 
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e" 
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +106,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +140,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +171,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +202,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..893bf215ed 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin ERROR LIMIT 5;
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#21

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Surafel Temesgen (#20)

Re: Conflict handling for COPY FROM

On 2019-Jun-28, Surafel Temesgen wrote:

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <
andrew.dunstan@2ndquadrant.com> wrote:

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported
these days.

i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

Warnings seem useless for this purpose. I'm with Andres: returning rows
would make this a fine feature. If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.

That would make the feature much more usable because you can do further
processing with the rows that conflict, if any is necessary (or throw
them away if not). Putting them in warnings will just make the screen
scroll fast.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#22

a.kondratov@postgrespro.ru

over 6 years ago

In reply to: Alvaro Herrera (#21)

Re: Conflict handling for COPY FROM

On 28.06.2019 16:12, Alvaro Herrera wrote:

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:

Or even just return it as a row. CopyBoth is relatively widely supported
these days.

i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

Warnings seem useless for this purpose. I'm with Andres: returning rows
would make this a fine feature. If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.

I agree with previous commentators that returning rows will make this
feature more versatile. Though, having a possibility to simply skip
conflicting/malformed rows is worth of doing from my perspective.
However, pushing every single skipped row to the client as a separated
WARNING will be too much for a bulk import. So maybe just overall stats
about skipped rows number will be enough?

Also, I would prefer having an option to ignore all errors, e.g. with
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
a number of future errors if you are playing with some badly structured
data, while always setting it to 100500k looks ugly.

Anyway, below are some issues with existing code after a brief review of
the patch:

1) Calculation of processed rows isn't correct (I've checked). You do it
in two places, and

-            processed++;
+            if (!cstate->error_limit)
+                processed++;

is never incremented if ERROR_LIMIT is specified and no errors
occurred/no constraints exist, so the result will always be 0. However,
if primary column with constraints exists, then processed is calculated
correctly, since another code path is used:

+                        if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;

I would prefer this calculation in a single place (as it was before
patch) for simplicity and in order to avoid such problems.

2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
is specified and was exceeded, which doesn't seem to be correct, does it?

-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 && 
cstate->error_limit == 0)
                              recheckIndexes = ExecInsertIndexTuples(myslot,

3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data 
for column \"%s\" ",

+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
  CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
  CONTEXT:  COPY x, line 1: "2001    231    \N    \N"

Otherwise, the patch applies/compiles cleanly and regression tests are
passed.

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#23

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554350168

surafel3000@gmail.com

over 6 years ago

In reply to: Alexey Kondratov (#22)

Re: Conflict handling for COPY FROM

Hi Alexey,
Thank you for looking at it

On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <a.kondratov@postgrespro.ru>
wrote:

On 28.06.2019 16:12, Alvaro Herrera wrote:

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de>

wrote:

Or even just return it as a row. CopyBoth is relatively widely

supported

these days.

i think generating warning about it also sufficiently meet its propose

of

notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

Warnings seem useless for this purpose. I'm with Andres: returning rows
would make this a fine feature. If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.

I agree with previous commentators that returning rows will make this
feature more versatile.

I agree. am looking at the options

Also, I would prefer having an option to ignore all errors, e.g. with

option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
a number of future errors if you are playing with some badly structured
data, while always setting it to 100500k looks ugly.

Good idea

1) Calculation of processed rows isn't correct (I've checked). You do it
in two places, and
-            processed++;
+            if (!cstate->error_limit)
+                processed++;
is never incremented if ERROR_LIMIT is specified and no errors
occurred/no constraints exist, so the result will always be 0. However,
if primary column with constraints exists, then processed is calculated
correctly, since another code path is used:

Correct. i will fix

+ if (specConflict)

+                        {
+                            ...
+                        }
+                        else
+                            processed++;
I would prefer this calculation in a single place (as it was before
patch) for simplicity and in order to avoid such problems.

2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
is specified and was exceeded, which doesn't seem to be correct, does it?
-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 &&
cstate->error_limit == 0)
recheckIndexes =
ExecInsertIndexTuples(myslot,

No it alwase executed . I did it this way to avoid
inserting index tuple twice but i see its unlikely

3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data
for column \"%s\" ",

+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
CONTEXT:  COPY x, line 1: "2001    231    \N    \N"

regards
Surafel

#24

Anthony Nowocien

anowocien@gmail.com

over 6 years ago

In reply to: Surafel Temesgen (#23)

Re: Conflict handling for COPY FROM

Hi,
I'm very interested in this patch and would like to give a review within a
week. On the feature side, how about simply using the less verbose "ERRORS"
instead of "ERROR LIMIT" ?

On Wed, Jul 3, 2019 at 1:42 PM Surafel Temesgen <surafel3000@gmail.com>
wrote:

Hi Alexey,
Thank you for looking at it

On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <
a.kondratov@postgrespro.ru> wrote:

On 28.06.2019 16:12, Alvaro Herrera wrote:

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de>

wrote:

Or even just return it as a row. CopyBoth is relatively widely

supported

these days.

i think generating warning about it also sufficiently meet its propose

of

notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

Warnings seem useless for this purpose. I'm with Andres: returning rows
would make this a fine feature. If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.

I agree with previous commentators that returning rows will make this
feature more versatile.

I agree. am looking at the options

Also, I would prefer having an option to ignore all errors, e.g. with

option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
a number of future errors if you are playing with some badly structured
data, while always setting it to 100500k looks ugly.

Good idea

I also +1 having an option to ignore all errors. Other RDBMS might use a

large number, but "-1" seems cleaner so far.

1) Calculation of processed rows isn't correct (I've checked). You do it
in two places, and
-            processed++;
+            if (!cstate->error_limit)
+                processed++;
is never incremented if ERROR_LIMIT is specified and no errors
occurred/no constraints exist, so the result will always be 0. However,
if primary column with constraints exists, then processed is calculated
correctly, since another code path is used:
Correct. i will fix

+ if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;
I would prefer this calculation in a single place (as it was before
patch) for simplicity and in order to avoid such problems.
ok
2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
is specified and was exceeded, which doesn't seem to be correct, does it?
-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 &&
cstate->error_limit == 0)
recheckIndexes =
ExecInsertIndexTuples(myslot,
No it alwase executed . I did it this way to avoid
inserting index tuple twice but i see its unlikely
3) Trailing whitespaces added to error messages and tests for some reason:
+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data
for column \"%s\" ",
+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",
-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
CONTEXT:  COPY x, line 1: "2000    230    23    23"
-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
CONTEXT:  COPY x, line 1: "2001    231    \N    \N"
regards
Surafel

Thanks,
Anthony

#25

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Surafel Temesgen (#20)

Re: Conflict handling for COPY FROM

On Fri, Jun 28, 2019 at 10:57 PM Surafel Temesgen <surafel3000@gmail.com> wrote:

In addition to the above change in the attached patch i also change
the syntax to ERROR LIMIT because it is no longer only skip
unique and exclusion constrain violation

Hi Surafel,

FYI copy.sgml has some DTD validity problems.

--
Thomas Munro
https://enterprisedb.com

#26

surafel3000@gmail.com

over 6 years ago

In reply to: Alexey Kondratov (#22)

1 attachment(s)

Re: Conflict handling for COPY FROM

Also, I would prefer having an option to ignore all errors, e.g. with
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
a number of future errors if you are playing with some badly structured
data, while always setting it to 100500k looks ugly.

Here are the patch that contain all the comment given except adding a way
to specify
to ignoring all error because specifying a highest number can do the work
and may be
try to store such badly structure data is a bad idea

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v7.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v7.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5e2992ddac..f16108b23b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR '<replaceable class="parameter">limit_number</replaceable>'
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR</literal></term>
+    <listitem>
+     <para>
+      Specifies to ignore error record up to <replaceable
+      class="parameter">limit_number</replaceable> number.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and same record formatting error is ignored.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4f04d122c3..2a6bc48f78 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -48,6 +48,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -154,6 +155,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -1291,6 +1293,21 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->error_limit > 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			if (cstate->error_limit <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("argument to option \"%s\" must be positive integer or -1",
+								defel->defname),
+						 parser_errposition(pstate, defel->location)));
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1441,6 +1458,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->error_limit && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2678,6 +2699,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	DestReceiver *dest;
 
 	Assert(cstate->rel);
 
@@ -2841,7 +2863,17 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->error_limit)
+	{
+		TupleDesc	tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+		ExecOpenIndices(resultRelInfo, true);
+		dest = CreateDestReceiver(DestRemoteSimple);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2946,6 +2978,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->error_limit)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3289,6 +3328,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if (cstate->error_limit && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock".
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, warn about it and preceded
+						 * to the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3706,7 +3802,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3721,9 +3817,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3735,10 +3843,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\" ",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3825,10 +3945,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 208b4a1f28..67abbe2f96 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -633,7 +633,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
-	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EVENT EXCEPT
+	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ERROR_P ESCAPE EVENT EXCEPT
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
 	EXTENSION EXTERNAL EXTRACT
 
@@ -3054,6 +3054,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *)makeString($2), @1);
 				}
+			| ERROR_P Iconst
+				{
+					$$ = makeDefElem("error_limit", (Node *)makeInteger($2), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
@@ -15094,6 +15098,7 @@ unreserved_keyword:
 			| ENCODING
 			| ENCRYPTED
 			| ENUM_P
+			| ERROR_P
 			| ESCAPE
 			| EVENT
 			| EXCLUDE
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 00ace8425e..1f4f154d19 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -146,6 +146,7 @@ PG_KEYWORD("encoding", ENCODING, UNRESERVED_KEYWORD)
 PG_KEYWORD("encrypted", ENCRYPTED, UNRESERVED_KEYWORD)
 PG_KEYWORD("end", END_P, RESERVED_KEYWORD)
 PG_KEYWORD("enum", ENUM_P, UNRESERVED_KEYWORD)
+PG_KEYWORD("error", ERROR_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("escape", ESCAPE, UNRESERVED_KEYWORD)
 PG_KEYWORD("event", EVENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("except", EXCEPT, RESERVED_KEYWORD)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c53ed3ebf5..d90afcfc86 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -36,10 +36,10 @@ COPY x from stdin;
 ERROR:  invalid input syntax for type integer: ""
 CONTEXT:  COPY x, line 1, column a: ""
 COPY x from stdin;
-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e" 
 CONTEXT:  COPY x, line 1: "2000	230	23	23"
 COPY x from stdin;
-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e" 
 CONTEXT:  COPY x, line 1: "2001	231	\N	\N"
 -- extra data: should fail
 COPY x from stdin;
@@ -55,6 +55,11 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin ERROR 5;
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +107,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +141,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +172,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +203,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..115cce6629 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin ERROR 5;
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#27

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Surafel Temesgen (#26)

Re: Conflict handling for COPY FROM

On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen <surafel3000@gmail.com> wrote:

Here are the patch that contain all the comment given except adding a way to specify
to ignoring all error because specifying a highest number can do the work and may be
try to store such badly structure data is a bad idea

Hi Surafel,

FYI GCC warns:

copy.c: In function ‘CopyFrom’:
copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
(void) dest->receiveSlot(myslot, dest);
^
copy.c:2702:16: note: ‘dest’ was declared here
DestReceiver *dest;
^

--
Thomas Munro
https://enterprisedb.com

#28

Anthony Nowocien

anowocien@gmail.com

over 6 years ago

In reply to: Thomas Munro (#27)

Re: Conflict handling for COPY FROM

Hi,

sorry for answering a bit later than I hoped. Here is my review so far:

Contents

======

This patch starts to address in my opinion one of COPY's shortcoming, which
is error handling. PK and exclusion errors are taken care of, but not
(yet?) other types of errors.

Documentation is updated, "\h copy" also and some regression tests are
added.

Initial Run

=======

Patch applies (i've tested v6) cleanly.

make: OK

make install: OK

make check: OK

make installcheck: OK

Performance

========

I've tested the patch on a 1.1G file with 10 000 000 lines. Each test was
done 15 times on a small local VM. Table is without constraints.

head: 38,93s

head + patch: 38,76s

Another test was one a 0.1GB file with 1 000 000 lines. Each test done 10
times on a small local VM and the table has a pk.

COPY 4,550s

COPY CONFLICT 4,595s

COPY CONFLICT with only one pk error 10,529s

COPY CONFLICT pk error every 100 lines 10,859s

COPY CONFLICT pk error every 1000 lines 10,879s

I did not test exclusions so far.

Thoughts

======

I find the feature useful in itself. One big question for me is can it be
improved later on to handle other types of errors (like check constraints
for example) ? A "-1" for the error limit would be very useful in my
opinion.

I am also afraid that the name "error_limit" might mislead users into
thinking that all error types are handled. But I do not have a better
suggestion without making this clause much longer...

I've had a short look at the code, but this will need review by someone
else.

Anyway, thanks a lot for taking the time to work on it.

Anthony

On Sun, Jul 14, 2019 at 3:48 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Show quoted text

On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen <surafel3000@gmail.com>
wrote:

Here are the patch that contain all the comment given except adding a

way to specify

to ignoring all error because specifying a highest number can do the

work and may be

try to store such badly structure data is a bad idea

Hi Surafel,

FYI GCC warns:

copy.c: In function ‘CopyFrom’:
copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
(void) dest->receiveSlot(myslot, dest);
^
copy.c:2702:16: note: ‘dest’ was declared here
DestReceiver *dest;
^

--
Thomas Munro
https://enterprisedb.com

#29

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Surafel Temesgen (#26)

Re: Conflict handling for COPY FROM

I think making ERROR a reserved word is a terrible idea, and we don't
need it for this feature anyway. Use a new option in the parenthesized
options list instead.

error_limit being an integer, please don't use it as a boolean:

if (cstate->error_limit)
...

Add an explicit comparison to zero instead, for code readability.
Also, since each error decrements the same variable, it becomes hard to
reason about the state: at the end, are we ending with the exact number
of errors, or did we start with the feature disabled? I suggest that
it'd make sense to have a boolean indicating whether this feature has
been requested, and the integer is just the remaining allowed problems.

Line 3255 or thereabouts contains an excess " char

The "warn about it" comment is obsolete, isn't it? There's no warning
there.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#30

surafel3000@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#29)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Sun, Jul 14, 2019 at 7:40 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

error_limit being an integer, please don't use it as a boolean:

if (cstate->error_limit)

...

Add an explicit comparison to zero instead, for code readability.
Also, since each error decrements the same variable, it becomes hard to
reason about the state: at the end, are we ending with the exact number
of errors, or did we start with the feature disabled? I suggest that
it'd make sense to have a boolean indicating whether this feature has
been requested, and the integer is just the remaining allowed problems.

done

Line 3255 or thereabouts contains an excess " char

fixed

The "warn about it" comment is obsolete, isn't it? There's no warning
there.

fixed

i also add an option to ignore all errors in ERROR set to -1

Attachments:

conflict-handling-onCopy-from-v8.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v8.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5e2992ddac..7aaebf56d8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR '<replaceable class="parameter">limit_number</replaceable>'
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR</literal></term>
+    <listitem>
+     <para>
+      Specifies to return error record up to <replaceable
+      class="parameter">limit_number</replaceable> number.
+      specifying it to -1 returns all error record.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and same record formatting error is ignored.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4f04d122c3..5884493307 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -48,6 +48,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -154,6 +155,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -184,6 +186,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -1291,6 +1296,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1441,6 +1458,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2678,6 +2699,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	DestReceiver *dest = NULL;
 
 	Assert(cstate->rel);
 
@@ -2841,7 +2863,17 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		tupDesc = RelationGetDescr(cstate->rel);
+		ExecOpenIndices(resultRelInfo, true);
+		dest = CreateDestReceiver(DestRemoteSimple);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2946,6 +2978,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3289,6 +3328,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock".
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3706,7 +3802,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3721,9 +3817,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3735,10 +3843,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3825,10 +3945,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 208b4a1f28..c99aab5579 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -633,7 +633,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
-	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EVENT EXCEPT
+	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ERROR_P ESCAPE EVENT EXCEPT
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
 	EXTENSION EXTERNAL EXTRACT
 
@@ -3054,6 +3054,10 @@ copy_opt_item:
 				{
 					$$ = makeDefElem("encoding", (Node *)makeString($2), @1);
 				}
+			| ERROR_P SignedIconst
+				{
+					$$ = makeDefElem("error_limit", (Node *)makeInteger($2), @1);
+				}
 		;
 
 /* The following exist for backward compatibility with very old versions */
@@ -15094,6 +15098,7 @@ unreserved_keyword:
 			| ENCODING
 			| ENCRYPTED
 			| ENUM_P
+			| ERROR_P
 			| ESCAPE
 			| EVENT
 			| EXCLUDE
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 00ace8425e..1f4f154d19 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -146,6 +146,7 @@ PG_KEYWORD("encoding", ENCODING, UNRESERVED_KEYWORD)
 PG_KEYWORD("encrypted", ENCRYPTED, UNRESERVED_KEYWORD)
 PG_KEYWORD("end", END_P, RESERVED_KEYWORD)
 PG_KEYWORD("enum", ENUM_P, UNRESERVED_KEYWORD)
+PG_KEYWORD("error", ERROR_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("escape", ESCAPE, UNRESERVED_KEYWORD)
 PG_KEYWORD("event", EVENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("except", EXCEPT, RESERVED_KEYWORD)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c53ed3ebf5..dbc8bb75c7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -55,6 +55,11 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin ERROR 5;
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +107,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +141,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +172,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +203,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..115cce6629 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin ERROR 5;
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#31

a.kondratov@postgrespro.ru

over 6 years ago

In reply to: Surafel Temesgen (#30)

1 attachment(s)

Re: Conflict handling for COPY FROM

Hi Surafel,

On 16.07.2019 10:08, Surafel Temesgen wrote:

i also add an option to ignore all errors in ERROR set to -1

Great!

The patch still applies cleanly (tested on e1c8743e6c), but I've got
some problems using more elaborated tests.

First of all, there is definitely a problem with grammar. In docs ERROR
is defined as option and

COPY test FROM '/path/to/copy-test-simple.csv' ERROR -1;

works just fine, but if modern 'WITH (...)' syntax is used:

COPY test FROM '/path/to/copy-test-simple.csv' WITH (ERROR -1);
ERROR: option "error" not recognized

while 'WITH (error_limit -1)' it works again.

It happens, since COPY supports modern and very-very old syntax:

* In the preferred syntax the options are comma-separated
* and use generic identifiers instead of keywords. The pre-9.0
* syntax had a hard-wired, space-separated set of options.

So I see several options here:

1) Everything is left as is, but then docs should be updated and
reflect, that error_limit is required for modern syntax.

2) However, why do we have to support old syntax here? I guess it exists
for backward compatibility only, but this is a completely new feature.
So maybe just 'WITH (error_limit 42)' will be enough?

3) You also may simply change internal option name from 'error_limit' to
'error' or SQL keyword from 'ERROR' tot 'ERROR_LIMIT'.

I would prefer the second option.

Next, you use DestRemoteSimple for returning conflicting tuples back:

+        dest = CreateDestReceiver(DestRemoteSimple);
+        dest->rStartup(dest, (int) CMD_SELECT, tupDesc);

However, printsimple supports very limited subset of built-in types, so

CREATE TABLE large_test (id integer primary key, num1 bigint, num2
double precision);
COPY large_test FROM '/path/to/copy-test.tsv';
COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;

fails with following error 'ERROR: unsupported type OID: 701', which
seems to be very confusing from the end user perspective. I've tried to
switch to DestRemote, but couldn't figure it out quickly.

Finally, I simply cannot get into this validation:

+        else if (strcmp(defel->defname, "error_limit") == 0)
+        {
+            if (cstate->ignore_error)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options"),
+                         parser_errposition(pstate, defel->location)));
+            cstate->error_limit = defGetInt64(defel);
+            cstate->ignore_error = true;
+            if (cstate->error_limit == -1)
+                cstate->ignore_all_error = true;
+        }

If cstate->ignore_error is defined, then we have already processed
options list, since this is the only one place, where it's set. So we
should never get into this ereport, doesn't it?

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#32

surafel3000@gmail.com

about 6 years ago

In reply to: Alexey Kondratov (#31)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Fri, Sep 20, 2019 at 4:16 PM Alexey Kondratov <a.kondratov@postgrespro.ru>
wrote:

First of all, there is definitely a problem with grammar. In docs ERROR
is defined as option and

COPY test FROM '/path/to/copy-test-simple.csv' ERROR -1;

works just fine, but if modern 'WITH (...)' syntax is used:

COPY test FROM '/path/to/copy-test-simple.csv' WITH (ERROR -1);
ERROR: option "error" not recognized

while 'WITH (error_limit -1)' it works again.

It happens, since COPY supports modern and very-very old syntax:

* In the preferred syntax the options are comma-separated
* and use generic identifiers instead of keywords. The pre-9.0
* syntax had a hard-wired, space-separated set of options.

So I see several options here:

1) Everything is left as is, but then docs should be updated and
reflect, that error_limit is required for modern syntax.

2) However, why do we have to support old syntax here? I guess it exists
for backward compatibility only, but this is a completely new feature.
So maybe just 'WITH (error_limit 42)' will be enough?

3) You also may simply change internal option name from 'error_limit' to
'error' or SQL keyword from 'ERROR' tot 'ERROR_LIMIT'.

I would prefer the second option.

agreed and Done

Next, you use DestRemoteSimple for returning conflicting tuples back:
+        dest = CreateDestReceiver(DestRemoteSimple);
+        dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
However, printsimple supports very limited subset of built-in types, so

CREATE TABLE large_test (id integer primary key, num1 bigint, num2
double precision);
COPY large_test FROM '/path/to/copy-test.tsv';
COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;

fails with following error 'ERROR: unsupported type OID: 701', which
seems to be very confusing from the end user perspective. I've tried to
switch to DestRemote, but couldn't figure it out quickly.

fixed

Finally, I simply cannot get into this validation:

+        else if (strcmp(defel->defname, "error_limit") == 0)
+        {
+            if (cstate->ignore_error)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options"),
+                         parser_errposition(pstate, defel->location)));
+            cstate->error_limit = defGetInt64(defel);
+            cstate->ignore_error = true;
+            if (cstate->error_limit == -1)
+                cstate->ignore_all_error = true;
+        }

If cstate->ignore_error is defined, then we have already processed
options list, since this is the only one place, where it's set. So we
should never get into this ereport, doesn't it?

yes the check only needed for modern syntax

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v9.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v9.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d9b7c4d0d4..ffcfe1e8d3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Specifies to return error record up to <replaceable
+      class="parameter">limit_number</replaceable> number.
+      specifying it to -1 returns all error record.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and same record formatting error is ignored.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e17d8c760f..c2314480b2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/printtup.h"
 #include "catalog/dependency.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
@@ -48,7 +49,9 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
+#include "tcop/pquery.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -154,6 +157,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -183,6 +187,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2675,6 +2698,8 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	DestReceiver *dest = NULL;
+	Portal		portal = NULL;
 
 	Assert(cstate->rel);
 
@@ -2838,7 +2863,20 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+
+		portal = GetPortalByName("");
+		dest = CreateDestReceiver(DestRemote);
+		SetRemoteDestReceiverParams(dest, portal);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2981,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3331,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock".
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3805,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3820,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3846,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3948,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c53ed3ebf5..37a77dcaa2 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -55,6 +55,11 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +107,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +141,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +172,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +203,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..2378f428fc 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#33

a.kondratov@postgrespro.ru

about 6 years ago

In reply to: Surafel Temesgen (#32)

Re: Conflict handling for COPY FROM

On 11.11.2019 16:00, Surafel Temesgen wrote:

Next, you use DestRemoteSimple for returning conflicting tuples back:
+        dest = CreateDestReceiver(DestRemoteSimple);
+        dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
However, printsimple supports very limited subset of built-in
types, so

CREATE TABLE large_test (id integer primary key, num1 bigint, num2
double precision);
COPY large_test FROM '/path/to/copy-test.tsv';
COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;

fails with following error 'ERROR: unsupported type OID: 701', which
seems to be very confusing from the end user perspective. I've
tried to
switch to DestRemote, but couldn't figure it out quickly.

fixed

Thanks, now it works with my tests.

1) Maybe it is fine, but now I do not like this part:

+    portal = GetPortalByName("");
+    dest = CreateDestReceiver(DestRemote);
+    SetRemoteDestReceiverParams(dest, portal);
+    dest->rStartup(dest, (int) CMD_SELECT, tupDesc);

Here you implicitly use the fact that portal with a blank name is always
created in exec_simple_query before we get to this point. Next, you
create new DestReceiver and set it to this portal, but it is also
already created and set in the exec_simple_query.

Would it be better if you just explicitly pass ready DestReceiver to
DoCopy (similarly to how it is done for T_ExecuteStmt / ExecuteQuery),
as it may be required by COPY now?

2) My second concern is that you use three internal flags to track
errors limit:

+    int            error_limit;    /* total number of error to ignore */
+    bool        ignore_error;    /* is ignore error specified? */
+    bool        ignore_all_error;    /* is error_limit -1 (ignore all 
error)
+                                     * specified? */

Though it seems that we can just leave error_limit as a user-defined
constant and track errors with something like errors_count. In that case
you do not need auxiliary ignore_all_error flag. But probably it is a
matter of personal choice.

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#34

surafel3000@gmail.com

about 6 years ago

In reply to: Alexey Kondratov (#33)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Fri, Nov 15, 2019 at 6:24 PM Alexey Kondratov <a.kondratov@postgrespro.ru>
wrote:

On 11.11.2019 16:00, Surafel Temesgen wrote:
Next, you use DestRemoteSimple for returning conflicting tuples back:
+        dest = CreateDestReceiver(DestRemoteSimple);
+        dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
However, printsimple supports very limited subset of built-in
types, so

CREATE TABLE large_test (id integer primary key, num1 bigint, num2
double precision);
COPY large_test FROM '/path/to/copy-test.tsv';
COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;

fails with following error 'ERROR: unsupported type OID: 701', which
seems to be very confusing from the end user perspective. I've
tried to
switch to DestRemote, but couldn't figure it out quickly.

fixed
Thanks, now it works with my tests.

1) Maybe it is fine, but now I do not like this part:
+    portal = GetPortalByName("");
+    dest = CreateDestReceiver(DestRemote);
+    SetRemoteDestReceiverParams(dest, portal);
+    dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
Here you implicitly use the fact that portal with a blank name is always
created in exec_simple_query before we get to this point. Next, you
create new DestReceiver and set it to this portal, but it is also
already created and set in the exec_simple_query.

Would it be better if you just explicitly pass ready DestReceiver to
DoCopy (similarly to how it is done for T_ExecuteStmt / ExecuteQuery),

Good idea .Thank you

2) My second concern is that you use three internal flags to track
errors limit:
+    int            error_limit;    /* total number of error to ignore */
+    bool        ignore_error;    /* is ignore error specified? */
+    bool        ignore_all_error;    /* is error_limit -1 (ignore all
error)
+                                     * specified? */
Though it seems that we can just leave error_limit as a user-defined
constant and track errors with something like errors_count. In that case
you do not need auxiliary ignore_all_error flag. But probably it is a
matter of personal choice.

using bool flags will save as from using integer type as a boolean and hold
the fact
error limit was specified even if it became zero and it seems to me it is
straightforward
to treat ignore_all_error separately.
Attache is the patch that use already created DestReceiver

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v10.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v10.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d9b7c4d0d4..ffcfe1e8d3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Specifies to return error record up to <replaceable
+      class="parameter">limit_number</replaceable> number.
+      specifying it to -1 returns all error record.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and same record formatting error is ignored.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e17d8c760f..c911b3d0c2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/printtup.h"
 #include "catalog/dependency.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
@@ -48,7 +49,9 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
+#include "tcop/pquery.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -154,6 +157,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -183,6 +187,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -837,7 +844,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 							   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
@@ -2675,6 +2698,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	Portal		portal = NULL;
 
 	Assert(cstate->rel);
 
@@ -2838,7 +2862,19 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+
+		portal = GetPortalByName("");
+		SetRemoteDestReceiverParams(dest, portal);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2979,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3329,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock.
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3803,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3818,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3844,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3946,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 7881079e96..521696be29 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -791,7 +791,7 @@ copy_table(Relation rel)
 	cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
 
 	/* Do the copy */
-	(void) CopyFrom(cstate);
+	(void) CopyFrom(cstate, NULL);
 
 	logicalrep_rel_close(relmapentry, NoLock);
 }
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index e984545780..cb7b0c80d2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -550,7 +550,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 
 				DoCopy(pstate, (CopyStmt *) parsetree,
 					   pstmt->stmt_location, pstmt->stmt_len,
-					   &processed);
+					   &processed, dest);
 				if (completionTag)
 					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 							 "COPY " UINT64_FORMAT, processed);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index bbe0105d77..16fc8e6a82 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -25,7 +25,7 @@ typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
-				   uint64 *processed);
+				   uint64 *processed, DestReceiver *dest);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
 extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
@@ -37,7 +37,7 @@ extern bool NextCopyFromRawFields(CopyState cstate,
 								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 
-extern uint64 CopyFrom(CopyState cstate);
+extern uint64 CopyFrom(CopyState cstate, DestReceiver *dest);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c53ed3ebf5..37a77dcaa2 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -55,6 +55,11 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +107,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +141,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +172,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +203,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..2378f428fc 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#35

a.kondratov@postgrespro.ru

about 6 years ago

In reply to: Surafel Temesgen (#34)

Re: Conflict handling for COPY FROM

On 18.11.2019 9:42, Surafel Temesgen wrote:

On Fri, Nov 15, 2019 at 6:24 PM Alexey Kondratov
<a.kondratov@postgrespro.ru <mailto:a.kondratov@postgrespro.ru>> wrote:

1) Maybe it is fine, but now I do not like this part:
+    portal = GetPortalByName("");
+    dest = CreateDestReceiver(DestRemote);
+    SetRemoteDestReceiverParams(dest, portal);
+    dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
Here you implicitly use the fact that portal with a blank name is
always
created in exec_simple_query before we get to this point. Next, you
create new DestReceiver and set it to this portal, but it is also
already created and set in the exec_simple_query.

Would it be better if you just explicitly pass ready DestReceiver to
DoCopy (similarly to how it is done for T_ExecuteStmt /
ExecuteQuery),

Good idea .Thank you

Now the whole patch works exactly as expected for me and I cannot find
any new technical flaws. However, the doc is rather vague, especially
these places:

+ specifying it to -1 returns all error record.

Actually, we return only rows with constraint violation, but malformed
rows are ignored with warning. I guess that we simply cannot return
malformed rows back to the caller in the same way as with constraint
violation, since we cannot figure out (in general) which column
corresponds to which type if there are extra or missing columns.

+ and same record formatting error is ignored.

I can get it, but it definitely should be reworded.

What about something like this?

+   <varlistentry>
+ <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#36

surafel3000@gmail.com

about 6 years ago

In reply to: Alexey Kondratov (#35)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Thu, Nov 21, 2019 at 4:22 PM Alexey Kondratov <a.kondratov@postgrespro.ru>
wrote:

Now the whole patch works exactly as expected for me and I cannot find
any new technical flaws. However, the doc is rather vague, especially
these places:

+ specifying it to -1 returns all error record.

Actually, we return only rows with constraint violation, but malformed
rows are ignored with warning. I guess that we simply cannot return
malformed rows back to the caller in the same way as with constraint
violation, since we cannot figure out (in general) which column
corresponds to which type if there are extra or missing columns.

+ and same record formatting error is ignored.

I can get it, but it definitely should be reworded.

What about something like this?
+   <varlistentry>
+ <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>

It is better so changed

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v11.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v11.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d9b7c4d0d4..a0ac5b4ef7 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,26 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e17d8c760f..c911b3d0c2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/printtup.h"
 #include "catalog/dependency.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
@@ -48,7 +49,9 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
+#include "tcop/pquery.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -154,6 +157,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -183,6 +187,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -837,7 +844,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 							   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
@@ -2675,6 +2698,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	Portal		portal = NULL;
 
 	Assert(cstate->rel);
 
@@ -2838,7 +2862,19 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+
+		portal = GetPortalByName("");
+		SetRemoteDestReceiverParams(dest, portal);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2979,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3329,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock.
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3803,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3818,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3844,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3946,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 7881079e96..521696be29 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -791,7 +791,7 @@ copy_table(Relation rel)
 	cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
 
 	/* Do the copy */
-	(void) CopyFrom(cstate);
+	(void) CopyFrom(cstate, NULL);
 
 	logicalrep_rel_close(relmapentry, NoLock);
 }
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index e984545780..cb7b0c80d2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -550,7 +550,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 
 				DoCopy(pstate, (CopyStmt *) parsetree,
 					   pstmt->stmt_location, pstmt->stmt_len,
-					   &processed);
+					   &processed, dest);
 				if (completionTag)
 					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 							 "COPY " UINT64_FORMAT, processed);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index bbe0105d77..16fc8e6a82 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -25,7 +25,7 @@ typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
-				   uint64 *processed);
+				   uint64 *processed, DestReceiver *dest);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
 extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
@@ -37,7 +37,7 @@ extern bool NextCopyFromRawFields(CopyState cstate,
 								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 
-extern uint64 CopyFrom(CopyState cstate);
+extern uint64 CopyFrom(CopyState cstate, DestReceiver *dest);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c53ed3ebf5..37a77dcaa2 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -55,6 +55,11 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +107,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +141,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +172,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +203,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..2378f428fc 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#37

about 6 years ago

In reply to: Surafel Temesgen (#36)

RE: Conflict handling for COPY FROM

Hello Surafel,

I'm very interested in this patch.
Although I'm a beginner,I would like to participate in the development of PostgreSQL.

1. I want to suggest new output format.
In my opinion, it's kind to display description of output and add "line number" and "error" to output.
For example,

error lines

line number | first | second | third | error
------------+-------+--------+-------+------------
1 | 1 | 10 | 0.5 | UNIQUE
2 | 2 | 42 | 0.1 | CHECK
3 | 3 | NULL | 0 | NOT NULL
(3 rows)

Although only unique or exclusion constraint violation returned back to the caller currently,
I think that column "error" will be useful when it becomes possible to handle other types of errors(check, not-null and so on).

If you assume that users re-execute COPY FROM with the output lines as input, these columns are obstacles.
Therefore I think that this output format should be displayed only when we set new option(for example ERROR_VERBOSE) like "COPY FROM ... ERROR_VERBOSE;".

2. I have a question about copy meta-command.
When I executed copy meta-command, output wasn't displayed.
Does it correspond to copy meta-command?

Regards

--
Asaba Takanori

#38

surafel3000@gmail.com

about 6 years ago

In reply to: asaba.takanori@fujitsu.com (#37)

Re: Conflict handling for COPY FROM

Hi Asaba,

On Thu, Dec 12, 2019 at 7:51 AM asaba.takanori@fujitsu.com <
asaba.takanori@fujitsu.com> wrote:

Hello Surafel,

I'm very interested in this patch.
Although I'm a beginner,I would like to participate in the development of
PostgreSQL.

1. I want to suggest new output format.
In my opinion, it's kind to display description of output and add "line
number" and "error" to output.
For example,

error lines

line number | first | second | third | error
------------+-------+--------+-------+------------
1 | 1 | 10 | 0.5 | UNIQUE
2 | 2 | 42 | 0.1 | CHECK
3 | 3 | NULL | 0 | NOT NULL
(3 rows)

Although only unique or exclusion constraint violation returned back to
the caller currently,
I think that column "error" will be useful when it becomes possible to
handle other types of errors(check, not-null and so on).

currently we can't get violation kind in speculative insertion

If you assume that users re-execute COPY FROM with the output lines as
input, these columns are obstacles.
Therefore I think that this output format should be displayed only when we
set new option(for example ERROR_VERBOSE) like "COPY FROM ...
ERROR_VERBOSE;".

i agree adding optional feature for this is useful in same scenario but
i think its a material for future improvement after basic feature done.

2. I have a question about copy meta-command.
When I executed copy meta-command, output wasn't displayed.
Does it correspond to copy meta-command?

okay . i will look at it
thank you

regards
Surafel

#39

surafel3000@gmail.com

about 6 years ago

In reply to: asaba.takanori@fujitsu.com (#37)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Thu, Dec 12, 2019 at 7:51 AM asaba.takanori@fujitsu.com <
asaba.takanori@fujitsu.com> wrote:

2. I have a question about copy meta-command.
When I executed copy meta-command, output wasn't displayed.
Does it correspond to copy meta-command?

Fixed

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v12.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v12.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d9b7c4d0d4..a0ac5b4ef7 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,26 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e17d8c760f..c911b3d0c2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/printtup.h"
 #include "catalog/dependency.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
@@ -48,7 +49,9 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
+#include "tcop/pquery.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -154,6 +157,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -183,6 +187,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -837,7 +844,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 							   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
@@ -2675,6 +2698,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	Portal		portal = NULL;
 
 	Assert(cstate->rel);
 
@@ -2838,7 +2862,19 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+
+		portal = GetPortalByName("");
+		SetRemoteDestReceiverParams(dest, portal);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2979,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3329,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock.
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3803,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3818,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3844,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3946,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 7881079e96..521696be29 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -791,7 +791,7 @@ copy_table(Relation rel)
 	cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
 
 	/* Do the copy */
-	(void) CopyFrom(cstate);
+	(void) CopyFrom(cstate, NULL);
 
 	logicalrep_rel_close(relmapentry, NoLock);
 }
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index e984545780..cb7b0c80d2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -550,7 +550,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 
 				DoCopy(pstate, (CopyStmt *) parsetree,
 					   pstmt->stmt_location, pstmt->stmt_len,
-					   &processed);
+					   &processed, dest);
 				if (completionTag)
 					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 							 "COPY " UINT64_FORMAT, processed);
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 90f6380170..1dfda330be 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1037,6 +1037,7 @@ ProcessResult(PGresult **results)
 {
 	bool		success = true;
 	bool		first_cycle = true;
+	bool		is_copy_in = false;
 
 	for (;;)
 	{
@@ -1160,6 +1161,7 @@ ProcessResult(PGresult **results)
 									   copystream,
 									   PQbinaryTuples(*results),
 									   &copy_result) && success;
+				is_copy_in = true;
 			}
 			ResetCancelConn();
 
@@ -1190,6 +1192,11 @@ ProcessResult(PGresult **results)
 		first_cycle = false;
 	}
 
+	/* Print returned result  for COPY FROM with error_limit. */
+	if (is_copy_in && !success && PQresultStatus(*results) !=
+		PGRES_FATAL_ERROR)
+		(void) PrintQueryTuples(*results);
+
 	SetResultVariables(*results, success);
 
 	/* may need this to recover from conn loss during COPY */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index bbe0105d77..16fc8e6a82 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -25,7 +25,7 @@ typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
-				   uint64 *processed);
+				   uint64 *processed, DestReceiver *dest);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
 extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
@@ -37,7 +37,7 @@ extern bool NextCopyFromRawFields(CopyState cstate,
 								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 
-extern uint64 CopyFrom(CopyState cstate);
+extern uint64 CopyFrom(CopyState cstate, DestReceiver *dest);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c53ed3ebf5..b1a17a8683 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -55,6 +55,15 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
+ a | b | c | d | e 
+---+---+---+---+---
+(0 rows)
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +111,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +145,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +176,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +207,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..2378f428fc 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#40

Tatsuo Ishii

ishii@sraoss.co.jp

almost 6 years ago

In reply to: Surafel Temesgen (#39)

Re: Conflict handling for COPY FROM

In your patch for copy.sgml:

ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'

I think this should be:

ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>

(no single quote)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#41

Tatsuo Ishii

ishii@sraoss.co.jp

almost 6 years ago

In reply to: Tatsuo Ishii (#40)

Re: Conflict handling for COPY FROM

In your patch for copy.sgml:

ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'

I think this should be:

ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>

(no single quote)

More comments:

- I think the document should stat that if limit_number = 0, all
errors are immediately raised (behaves same as current befavior without the patch).

- "constraint violating rows will be returned back to the caller."
This does explains the current implementation. I am not sure if it's
intended or not though:

cat /tmp/a
1 1
2 2
3 3
3 4

psql test
$ psql test
psql (13devel)
Type "help" for help.

test=# select * from t1;
i | j
---+---
1 | 1
2 | 2
3 | 3
(3 rows)

test=# copy t1 from '/tmp/a' with (error_limit 1);
ERROR: duplicate key value violates unique constraint "t1_pkey"
DETAIL: Key (i)=(2) already exists.
CONTEXT: COPY t1, line 2: "2 2"

So if the number of errors raised exceeds error_limit, no constaraint
violating rows (in this case i=1, j=1) are returned.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#42

surafel3000@gmail.com

almost 6 years ago

In reply to: Tatsuo Ishii (#41)

1 attachment(s)

Re: Conflict handling for COPY FROM

Hi,

ERROR_LIMIT '<replaceable

class="parameter">limit_number</replaceable>'

I think this should be:

ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>

(no single quote)

Thank you .Fixed

More comments:

- I think the document should stat that if limit_number = 0, all
errors are immediately raised (behaves same as current befavior without
the patch).

if we want all error to be raised error limit_number not need to be
specified.
but if it is specified like limit_number = 0 i think it is self-explanatory

- "constraint violating rows will be returned back to the caller."
This does explains the current implementation. I am not sure if it's
intended or not though:

cat /tmp/a
1 1
2 2
3 3
3 4

psql test
$ psql test
psql (13devel)
Type "help" for help.

test=# select * from t1;
i | j
---+---
1 | 1
2 | 2
3 | 3
(3 rows)

test=# copy t1 from '/tmp/a' with (error_limit 1);
ERROR: duplicate key value violates unique constraint "t1_pkey"
DETAIL: Key (i)=(2) already exists.
CONTEXT: COPY t1, line 2: "2 2"

So if the number of errors raised exceeds error_limit, no constaraint
violating rows (in this case i=1, j=1) are returned.

error_limit is specified to dictate the number of error allowed in copy
operation
to precede. If it exceed the number the operation is stopped. there may
be more conflict afterward and returning limited number of conflicting rows
have no much use

regards
Surafel

Attachments:

conflict-handling-q-from-v13.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-q-from-v13.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a99f8155e4..c53e5f6d92 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,26 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..72225a85a0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/printtup.h"
 #include "catalog/dependency.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
@@ -48,7 +49,9 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
+#include "tcop/pquery.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -153,6 +156,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -182,6 +186,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -836,7 +843,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 							   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
@@ -2675,6 +2698,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	Portal		portal = NULL;
 
 	Assert(cstate->rel);
 
@@ -2838,7 +2862,19 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+
+		portal = GetPortalByName("");
+		SetRemoteDestReceiverParams(dest, portal);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2979,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3329,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock.
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3803,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3818,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3844,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3946,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index f8183cd488..817d0af002 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -784,7 +784,7 @@ copy_table(Relation rel)
 	cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
 
 	/* Do the copy */
-	(void) CopyFrom(cstate);
+	(void) CopyFrom(cstate, NULL);
 
 	logicalrep_rel_close(relmapentry, NoLock);
 }
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index bb85b5e52a..746a2a5160 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -728,7 +728,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 
 				DoCopy(pstate, (CopyStmt *) parsetree,
 					   pstmt->stmt_location, pstmt->stmt_len,
-					   &processed);
+					   &processed, dest);
 				if (completionTag)
 					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 							 "COPY " UINT64_FORMAT, processed);
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 67df0cd2c7..34869aaec6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -892,6 +892,7 @@ ProcessResult(PGresult **results)
 {
 	bool		success = true;
 	bool		first_cycle = true;
+	bool		is_copy_in = false;
 
 	for (;;)
 	{
@@ -1015,6 +1016,7 @@ ProcessResult(PGresult **results)
 									   copystream,
 									   PQbinaryTuples(*results),
 									   &copy_result) && success;
+				is_copy_in = true;
 			}
 			ResetCancelConn();
 
@@ -1045,6 +1047,11 @@ ProcessResult(PGresult **results)
 		first_cycle = false;
 	}
 
+	/* Print returned result  for COPY FROM with error_limit. */
+	if (is_copy_in && !success && PQresultStatus(*results) !=
+		PGRES_FATAL_ERROR)
+		(void) PrintQueryTuples(*results);
+
 	SetResultVariables(*results, success);
 
 	/* may need this to recover from conn loss during COPY */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..addd8054d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -25,7 +25,7 @@ typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
-				   uint64 *processed);
+				   uint64 *processed, DestReceiver *dest);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
 extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
@@ -37,7 +37,7 @@ extern bool NextCopyFromRawFields(CopyState cstate,
 								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 
-extern uint64 CopyFrom(CopyState cstate);
+extern uint64 CopyFrom(CopyState cstate, DestReceiver *dest);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d25a..773e965970 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -55,6 +55,15 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
+ a | b | c | d | e 
+---+---+---+---+---
+(0 rows)
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +111,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +145,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +176,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +207,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..2378f428fc 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#43

Tatsuo Ishii

ishii@sraoss.co.jp

almost 6 years ago

In reply to: Surafel Temesgen (#42)

Re: Conflict handling for COPY FROM

test=# copy t1 from '/tmp/a' with (error_limit 1);
ERROR: duplicate key value violates unique constraint "t1_pkey"
DETAIL: Key (i)=(2) already exists.
CONTEXT: COPY t1, line 2: "2 2"

So if the number of errors raised exceeds error_limit, no constaraint
violating rows (in this case i=1, j=1) are returned.

error_limit is specified to dictate the number of error allowed in copy
operation
to precede. If it exceed the number the operation is stopped. there may
be more conflict afterward and returning limited number of conflicting rows
have no much use

Still I see your explanation differs from what the document patch says.

+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller.

I am afraid once this patch is part of next version of PostgreSQL, we
get many complains/inqueires from users. What about changing like this:

Currently, only unique or exclusion constraint violation and
rows formatting errors are ignored. Malformed rows will rise
warnings, while constraint violating rows will be returned back
to the caller unless any error is raised; i.e. if any error is
raised due to error_limit exceeds, no rows will be returned back
to the caller.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#44

surafel3000@gmail.com

almost 6 years ago

In reply to: Tatsuo Ishii (#43)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Mon, Feb 17, 2020 at 10:00 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

test=# copy t1 from '/tmp/a' with (error_limit 1);
ERROR: duplicate key value violates unique constraint "t1_pkey"
DETAIL: Key (i)=(2) already exists.
CONTEXT: COPY t1, line 2: "2 2"

So if the number of errors raised exceeds error_limit, no constaraint
violating rows (in this case i=1, j=1) are returned.

error_limit is specified to dictate the number of error allowed in copy
operation
to precede. If it exceed the number the operation is stopped. there may
be more conflict afterward and returning limited number of conflicting

rows

have no much use

Still I see your explanation differs from what the document patch says.
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller.
I am afraid once this patch is part of next version of PostgreSQL, we
get many complains/inqueires from users. What about changing like this:

Currently, only unique or exclusion constraint violation and
rows formatting errors are ignored. Malformed rows will rise
warnings, while constraint violating rows will be returned back
to the caller unless any error is raised; i.e. if any error is
raised due to error_limit exceeds, no rows will be returned back
to the caller.

Its better so amended .

regards
Surafel

Attachments:

conflict-handling-q-from-v14.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-q-from-v14.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a99f8155e4..845902b824 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,28 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller unless any error is raised;
+      i.e. if any error is raised due to error_limit exceeds, no rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..72225a85a0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
+#include "access/printtup.h"
 #include "catalog/dependency.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
@@ -48,7 +49,9 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
+#include "tcop/pquery.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -153,6 +156,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -182,6 +186,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -836,7 +843,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 							   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
@@ -2675,6 +2698,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	Portal		portal = NULL;
 
 	Assert(cstate->rel);
 
@@ -2838,7 +2862,19 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+
+		portal = GetPortalByName("");
+		SetRemoteDestReceiverParams(dest, portal);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2979,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3329,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock.
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3803,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3818,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3844,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3946,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index f8183cd488..817d0af002 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -784,7 +784,7 @@ copy_table(Relation rel)
 	cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
 
 	/* Do the copy */
-	(void) CopyFrom(cstate);
+	(void) CopyFrom(cstate, NULL);
 
 	logicalrep_rel_close(relmapentry, NoLock);
 }
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index bb85b5e52a..746a2a5160 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -728,7 +728,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 
 				DoCopy(pstate, (CopyStmt *) parsetree,
 					   pstmt->stmt_location, pstmt->stmt_len,
-					   &processed);
+					   &processed, dest);
 				if (completionTag)
 					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 							 "COPY " UINT64_FORMAT, processed);
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 67df0cd2c7..34869aaec6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -892,6 +892,7 @@ ProcessResult(PGresult **results)
 {
 	bool		success = true;
 	bool		first_cycle = true;
+	bool		is_copy_in = false;
 
 	for (;;)
 	{
@@ -1015,6 +1016,7 @@ ProcessResult(PGresult **results)
 									   copystream,
 									   PQbinaryTuples(*results),
 									   &copy_result) && success;
+				is_copy_in = true;
 			}
 			ResetCancelConn();
 
@@ -1045,6 +1047,11 @@ ProcessResult(PGresult **results)
 		first_cycle = false;
 	}
 
+	/* Print returned result  for COPY FROM with error_limit. */
+	if (is_copy_in && !success && PQresultStatus(*results) !=
+		PGRES_FATAL_ERROR)
+		(void) PrintQueryTuples(*results);
+
 	SetResultVariables(*results, success);
 
 	/* may need this to recover from conn loss during COPY */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..addd8054d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -25,7 +25,7 @@ typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
-				   uint64 *processed);
+				   uint64 *processed, DestReceiver *dest);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
 extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
@@ -37,7 +37,7 @@ extern bool NextCopyFromRawFields(CopyState cstate,
 								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 
-extern uint64 CopyFrom(CopyState cstate);
+extern uint64 CopyFrom(CopyState cstate, DestReceiver *dest);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d25a..773e965970 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -55,6 +55,15 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
+ a | b | c | d | e 
+---+---+---+---+---
+(0 rows)
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +111,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +145,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +176,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +207,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..2378f428fc 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -110,6 +110,14 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#45

almost 6 years ago

In reply to: Surafel Temesgen (#39)

RE: Conflict handling for COPY FROM

Hello Surafel,

Sorry for my late reply.

From: Surafel Temesgen <surafel3000@gmail.com>

On Thu, Dec 12, 2019 at 7:51 AM mailto:asaba.takanori@fujitsu.com <mailto:asaba.takanori@fujitsu.com> wrote:

2. I have a question about copy meta-command.
When I executed copy meta-command, output wasn't displayed.
Does it correspond to copy meta-command?

Fixed

Thank you.

I think we need regression test that constraint violating row is returned back to the caller.
How about this?

・　/src/test/regress/expected/copy2.out

@@ -1,5 +1,5 @@
 CREATE TEMP TABLE x (
-       a serial,
+       a serial UNIQUE,
        b int,
        c text not null default 'stuff',
        d text,
@@ -55,6 +55,16 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001      22      32" --- missing data for column "d"
+WARNING:  skipping "70002      23      33      43      53      54" --- extra data after last expected column
+WARNING:  skipping "70003      24      34      44" --- missing data for column "e"
+
+     a    |  b    | c    |  d   |               e
+-------+----+----+----+----------------------
+ 70005 | 27  | 37  |  47  | before trigger fired
+(1 row)
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist

・　src/test/regress/sql/copy2.sql

@@ -1,5 +1,5 @@
CREATE TEMP TABLE x (
- a serial,
+ a serial UNIQUE,
b int,
c text not null default 'stuff',
d text,
@@ -110,6 +110,15 @@ COPY x from stdin WHERE a > 60003;
60005 26 36 46 56
\.

+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001  22      32
+70002  23      33      43      53      54
+70003  24      34      44
+70004  25      35      45      55
+70005  26      36      46      56
+70005  27      37      47      57
+\.
+
 COPY x from stdin WHERE f > 60003;

COPY x from stdin WHERE a = max(x.b);

Regards,

--
Takanori Asaba

#46

surafel3000@gmail.com

almost 6 years ago

In reply to: asaba.takanori@fujitsu.com (#45)

1 attachment(s)

Re: Conflict handling for COPY FROM

On Fri, Mar 6, 2020 at 11:30 AM asaba.takanori@fujitsu.com <
asaba.takanori@fujitsu.com> wrote:

Hello Surafel,

Sorry for my late reply.

From: Surafel Temesgen <surafel3000@gmail.com>

On Thu, Dec 12, 2019 at 7:51 AM mailto:asaba.takanori@fujitsu.com

<mailto:asaba.takanori@fujitsu.com> wrote:

2. I have a question about copy meta-command.
When I executed copy meta-command, output wasn't displayed.
Does it correspond to copy meta-command?

Fixed

Thank you.

I think we need regression test that constraint violating row is returned
back to the caller.
How about this?

okay attached is a rebased patch with it

regards
Surafel

Attachments:

conflict-handling-copy-from-v15.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-copy-from-v15.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a99f8155e4..845902b824 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,28 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller unless any error is raised;
+      i.e. if any error is raised due to error_limit exceeds, no rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4cb8..4184e2e755 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -20,6 +20,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/printtup.h"
 #include "access/sysattr.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -48,6 +49,8 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
+#include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -153,6 +156,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -182,6 +186,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -836,7 +843,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 							   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
@@ -2675,6 +2698,7 @@ CopyFrom(CopyState cstate)
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	Portal		portal = NULL;
 
 	Assert(cstate->rel);
 
@@ -2838,7 +2862,19 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+
+		portal = GetPortalByName("");
+		SetRemoteDestReceiverParams(dest, portal);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2979,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3329,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock.
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3803,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3818,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3844,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
@@ -3822,10 +3946,23 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 		}
 
 		if (fld_count != attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+								cstate->line_buf.data, (int) fld_count, attr_count)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("row field count is %d, expected %d",
+								(int) fld_count, attr_count)));
+
+		}
 
 		i = 0;
 		foreach(cur, cstate->attnumlist)
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index f8183cd488..817d0af002 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -784,7 +784,7 @@ copy_table(Relation rel)
 	cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
 
 	/* Do the copy */
-	(void) CopyFrom(cstate);
+	(void) CopyFrom(cstate, NULL);
 
 	logicalrep_rel_close(relmapentry, NoLock);
 }
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d0..59d7fed099 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -720,7 +720,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 
 				DoCopy(pstate, (CopyStmt *) parsetree,
 					   pstmt->stmt_location, pstmt->stmt_len,
-					   &processed);
+					   &processed, dest);
 				if (qc)
 					SetQueryCompletion(qc, CMDTAG_COPY, processed);
 			}
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 67df0cd2c7..34869aaec6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -892,6 +892,7 @@ ProcessResult(PGresult **results)
 {
 	bool		success = true;
 	bool		first_cycle = true;
+	bool		is_copy_in = false;
 
 	for (;;)
 	{
@@ -1015,6 +1016,7 @@ ProcessResult(PGresult **results)
 									   copystream,
 									   PQbinaryTuples(*results),
 									   &copy_result) && success;
+				is_copy_in = true;
 			}
 			ResetCancelConn();
 
@@ -1045,6 +1047,11 @@ ProcessResult(PGresult **results)
 		first_cycle = false;
 	}
 
+	/* Print returned result  for COPY FROM with error_limit. */
+	if (is_copy_in && !success && PQresultStatus(*results) !=
+		PGRES_FATAL_ERROR)
+		(void) PrintQueryTuples(*results);
+
 	SetResultVariables(*results, success);
 
 	/* may need this to recover from conn loss during COPY */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..addd8054d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -25,7 +25,7 @@ typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
-				   uint64 *processed);
+				   uint64 *processed, DestReceiver *dest);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
 extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
@@ -37,7 +37,7 @@ extern bool NextCopyFromRawFields(CopyState cstate,
 								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 
-extern uint64 CopyFrom(CopyState cstate);
+extern uint64 CopyFrom(CopyState cstate, DestReceiver *dest);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d25a..37d973cb20 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,5 +1,5 @@
 CREATE TEMP TABLE x (
-	a serial,
+	a serial UNIQUE,
 	b int,
 	c text not null default 'stuff',
 	d text,
@@ -55,6 +55,16 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
+   a   | b  | c  | d  |          e           
+-------+----+----+----+----------------------
+ 70005 | 27 | 36 | 46 | before trigger fired
+(1 row)
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +112,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +146,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +177,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +208,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..64b9a51947 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -1,5 +1,5 @@
 CREATE TEMP TABLE x (
-	a serial,
+	a serial UNIQUE,
 	b int,
 	c text not null default 'stuff',
 	d text,
@@ -110,6 +110,15 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+70005	27	36	46	56
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#47

[1]: https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1178
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1178

a.kondratov@postgrespro.ru

almost 6 years ago

In reply to: Surafel Temesgen (#46)

Re: Conflict handling for COPY FROM

On 09.03.2020 15:34, Surafel Temesgen wrote:

okay attached is a rebased patch with it

+    Portal        portal = NULL;
...
+        portal = GetPortalByName("");
+        SetRemoteDestReceiverParams(dest, portal);

I think that you do not need this, since you are using a ready
DestReceiver. The whole idea of passing DestReceiver down to the
CopyFrom was to avoid that code. This unnamed portal is created in the
exec_simple_query [1]https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1178 and has been already set to the DestReceiver there
[2]: https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1226

Maybe I am missing something, but I have just removed this code and
everything works just fine.

[2]: https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1226
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1226

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#48

almost 6 years ago

In reply to: Surafel Temesgen (#46)

RE: Conflict handling for COPY FROM

Hello Surafel,

From: Surafel Temesgen <surafel3000@gmail.com>

On Fri, Mar 6, 2020 at 11:30 AM mailto:asaba.takanori@fujitsu.com <mailto:asaba.takanori@fujitsu.com> wrote:
I think we need regression test that constraint violating row is returned back to the caller.
How about this?

okay attached is a rebased patch with it

Thank you very much.
Although it is a small point, it may be better like this:
+70005 27 36 46 56 -> 70005 27 37 47 57

I want to discuss about copy from binary file.
It seems that this patch tries to avoid the error that number of field is different .

+               {
+                       if (cstate->error_limit > 0 || cstate->ignore_all_error)
+                       {
+                               ereport(WARNING,
+                                               (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                                                errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+                                                               cstate->line_buf.data, (int) fld_count, attr_count)));
+                               cstate->error_limit--;
+                               goto next_line;
+                       }
+                       else
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                                                errmsg("row field count is %d, expected %d",
+                                                               (int) fld_count, attr_count)));
+
+               }

I checked like this:

postgres=# CREATE TABLE x (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text,
postgres(# e text
postgres(# );
CREATE TABLE
postgres=# COPY x from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

70004 25 35 45 55
70005 26 36 46 56
\.

COPY 2
postgres=# SELECT * FROM x;
a | b | c | d | e
-------+----+----+----+----
70004 | 25 | 35 | 45 | 55
70005 | 26 | 36 | 46 | 56
(2 rows)

postgres=# COPY x TO '/tmp/copyout' (FORMAT binary);
COPY 2
postgres=# CREATE TABLE y (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text
postgres(# );
CREATE TABLE
postgres=# COPY y FROM '/tmp/copyout' WITH (FORMAT binary,ERROR_LIMIT -1);
2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field count is 5, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 1
2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field count is 0, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 2
2020-03-12 16:55:55.457 JST [2319] ERROR: unexpected EOF in COPY data
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 3, column a
2020-03-12 16:55:55.457 JST [2319] STATEMENT: COPY y FROM '/tmp/copyout' WITH (FORMAT binary,ERROR_LIMIT -1);
WARNING: skipping "" --- row field count is 5, expected 4
WARNING: skipping "" --- row field count is 0, expected 4
ERROR: unexpected EOF in COPY data
CONTEXT: COPY y, line 3, column a

It seems that the error isn't handled.
'WARNING: skipping "" --- row field count is 5, expected 4' is correct,
but ' WARNING: skipping "" --- row field count is 0, expected 4' is not correct.

Also, is it needed to skip the error that happens when input is binary file?
Is the case that each row has different number of field and only specific rows are copied occurred?

Regards,

--
Takanori Asaba

#49

surafel3000@gmail.com

almost 6 years ago

In reply to: asaba.takanori@fujitsu.com (#48)

1 attachment(s)

Re: Conflict handling for COPY FROM

Hi Takanori Asaba,

Although it is a small point, it may be better like this:
+70005 27 36 46 56 -> 70005 27 37 47 57

done

I want to discuss about copy from binary file.
It seems that this patch tries to avoid the error that number of field is
different .
+               {
+                       if (cstate->error_limit > 0 ||
cstate->ignore_all_error)
+                       {
+                               ereport(WARNING,
+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                                                errmsg("skipping \"%s\"
--- row field count is %d, expected %d",
+
cstate->line_buf.data, (int) fld_count, attr_count)));
+                               cstate->error_limit--;
+                               goto next_line;
+                       }
+                       else
+                               ereport(ERROR,
+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                                                errmsg("row field count
is %d, expected %d",
+                                                               (int)
fld_count, attr_count)));
+
+               }
I checked like this:

postgres=# CREATE TABLE x (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text,
postgres(# e text
postgres(# );
CREATE TABLE
postgres=# COPY x from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

70004 25 35 45 55
70005 26 36 46 56
\.

COPY 2
postgres=# SELECT * FROM x;
a | b | c | d | e
-------+----+----+----+----
70004 | 25 | 35 | 45 | 55
70005 | 26 | 36 | 46 | 56
(2 rows)

postgres=# COPY x TO '/tmp/copyout' (FORMAT binary);
COPY 2
postgres=# CREATE TABLE y (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text
postgres(# );
CREATE TABLE
postgres=# COPY y FROM '/tmp/copyout' WITH (FORMAT binary,ERROR_LIMIT -1);
2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field
count is 5, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 1
2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field
count is 0, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 2
2020-03-12 16:55:55.457 JST [2319] ERROR: unexpected EOF in COPY data
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 3, column a
2020-03-12 16:55:55.457 JST [2319] STATEMENT: COPY y FROM '/tmp/copyout'
WITH (FORMAT binary,ERROR_LIMIT -1);
WARNING: skipping "" --- row field count is 5, expected 4
WARNING: skipping "" --- row field count is 0, expected 4
ERROR: unexpected EOF in COPY data
CONTEXT: COPY y, line 3, column a

It seems that the error isn't handled.
'WARNING: skipping "" --- row field count is 5, expected 4' is correct,
but ' WARNING: skipping "" --- row field count is 0, expected 4' is not
correct.

Thank you for the detailed example

Also, is it needed to skip the error that happens when input is binary
file?
Is the case that each row has different number of field and only specific
rows are copied occurred?

An error that can be surly handled without transaction rollback can
be included in error handling but i will like to proceed without binary file
errors handling for the time being

regards
Surafel

Attachments:

conflict-handling-copy-from-v16.patchapplication/octet-stream; name=conflict-handling-copy-from-v16.patchDownload

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index a99f8155e4..845902b824 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>
 </synopsis>
  </refsynopsisdiv>
 
@@ -355,6 +356,28 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>ERROR_LIMIT</literal></term>
+    <listitem>
+     <para>
+      Enables ignoring of errored out rows up to <replaceable
+      class="parameter">limit_number</replaceable>. If <replaceable
+      class="parameter">limit_number</replaceable> is set
+      to -1, then all errors will be ignored.
+     </para>
+
+     <para>
+      Currently, only unique or exclusion constraint violation
+      and rows formatting errors are ignored. Malformed
+      rows will rise warnings, while constraint violating rows
+      will be returned back to the caller unless any error is raised;
+      i.e. if any error is raised due to error_limit exceeds, no rows
+      will be returned back to the caller.
+     </para>
+
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WHERE</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4cb8..f847e2fe79 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -20,6 +20,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/printtup.h"
 #include "access/sysattr.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -48,6 +49,8 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
+#include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -153,6 +156,7 @@ typedef struct CopyStateData
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
+	int			error_limit;	/* total number of error to ignore */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -182,6 +186,9 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 	ExprState  *qualexpr;
+	bool		ignore_error;	/* is ignore error specified? */
+	bool		ignore_all_error;	/* is error_limit -1 (ignore all error)
+									 * specified? */
 
 	TransitionCaptureState *transition_capture;
 
@@ -836,7 +843,7 @@ CopyLoadRawBuf(CopyState cstate)
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	   int stmt_location, int stmt_len,
-	   uint64 *processed)
+	   uint64 *processed, DestReceiver *dest)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -1068,7 +1075,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 							   NULL, stmt->attlist, stmt->options);
 		cstate->whereClause = whereClause;
-		*processed = CopyFrom(cstate);	/* copy from file to database */
+		*processed = CopyFrom(cstate, dest);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
 	else
@@ -1290,6 +1297,18 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "error_limit") == 0)
+		{
+			if (cstate->ignore_error)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			cstate->error_limit = defGetInt64(defel);
+			cstate->ignore_error = true;
+			if (cstate->error_limit == -1)
+				cstate->ignore_all_error = true;
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1440,6 +1459,10 @@ ProcessCopyOptions(ParseState *pstate,
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
+	if (cstate->ignore_error && !cstate->is_copy_from)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("ERROR LIMIT only available using COPY FROM")));
 }
 
 /*
@@ -2653,7 +2676,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
  * Copy FROM file to relation.
  */
 uint64
-CopyFrom(CopyState cstate)
+CopyFrom(CopyState cstate, DestReceiver *dest)
 {
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
@@ -2838,7 +2861,16 @@ CopyFrom(CopyState cstate)
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
-	ExecOpenIndices(resultRelInfo, false);
+	if (cstate->ignore_error)
+	{
+		TupleDesc	tupDesc;
+
+		ExecOpenIndices(resultRelInfo, true);
+		tupDesc = RelationGetDescr(cstate->rel);
+		dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
+	}
+	else
+		ExecOpenIndices(resultRelInfo, false);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2943,6 +2975,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_error)
+	{
+		/*
+		 * Can't support speculative insertion in multi-inserts.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -3286,6 +3325,63 @@ CopyFrom(CopyState cstate)
 						 */
 						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
+					else if ((cstate->error_limit > 0 || cstate->ignore_all_error) && resultRelInfo->ri_NumIndices > 0)
+					{
+						/* Perform a speculative insertion. */
+						uint32		specToken;
+						ItemPointerData conflictTid;
+						bool		specConflict;
+
+						/*
+						 * Do a non-conclusive check for conflicts first.
+						 */
+						specConflict = false;
+
+						if (!ExecCheckIndexConstraints(myslot, estate, &conflictTid,
+													   NIL))
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+
+						/*
+						 * Acquire our speculative insertion lock.
+						 */
+						specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+
+						/* insert the tuple, with the speculative token */
+						table_tuple_insert_speculative(resultRelInfo->ri_RelationDesc, myslot,
+													   estate->es_output_cid,
+													   0,
+													   NULL,
+													   specToken);
+
+						/* insert index entries for tuple */
+						recheckIndexes = ExecInsertIndexTuples(myslot, estate, true,
+															   &specConflict,
+															   NIL);
+
+						/* adjust the tuple's state accordingly */
+						table_tuple_complete_speculative(resultRelInfo->ri_RelationDesc, myslot,
+														 specToken, !specConflict);
+
+						/*
+						 * Wake up anyone waiting for our decision.
+						 */
+						SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+
+						/*
+						 * If there was a conflict, return it and preceded to
+						 * the next record if there are any.
+						 */
+						if (specConflict)
+						{
+							(void) dest->receiveSlot(myslot, dest);
+							cstate->error_limit--;
+							continue;
+						}
+					}
 					else
 					{
 						/* OK, store the tuple and create index entries for it */
@@ -3703,7 +3799,7 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 	/* Initialize all values for row to NULL */
 	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-
+next_line:
 	if (!cstate->binary)
 	{
 		char	  **field_strings;
@@ -3718,9 +3814,21 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
 		/* check for overflowing fields */
 		if (attr_count > 0 && fldct > attr_count)
-			ereport(ERROR,
-					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
+		{
+			if (cstate->error_limit > 0 || cstate->ignore_all_error)
+			{
+				ereport(WARNING,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("skipping \"%s\" --- extra data after last expected column",
+								cstate->line_buf.data)));
+				cstate->error_limit--;
+				goto next_line;
+			}
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+						 errmsg("extra data after last expected column")));
+		}
 
 		fieldno = 0;
 
@@ -3732,10 +3840,22 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
 			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
+			{
+				if (cstate->error_limit > 0 || cstate->ignore_all_error)
+				{
+					ereport(WARNING,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("skipping \"%s\" --- missing data for column \"%s\"",
+									cstate->line_buf.data, NameStr(att->attname))));
+					cstate->error_limit--;
+					goto next_line;
+				}
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+							 errmsg("missing data for column \"%s\"",
+									NameStr(att->attname))));
+			}
 			string = field_strings[fieldno++];
 
 			if (cstate->convert_select_flags &&
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index f8183cd488..817d0af002 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -784,7 +784,7 @@ copy_table(Relation rel)
 	cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
 
 	/* Do the copy */
-	(void) CopyFrom(cstate);
+	(void) CopyFrom(cstate, NULL);
 
 	logicalrep_rel_close(relmapentry, NoLock);
 }
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1f7f6e2d0..59d7fed099 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -720,7 +720,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 
 				DoCopy(pstate, (CopyStmt *) parsetree,
 					   pstmt->stmt_location, pstmt->stmt_len,
-					   &processed);
+					   &processed, dest);
 				if (qc)
 					SetQueryCompletion(qc, CMDTAG_COPY, processed);
 			}
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 67df0cd2c7..34869aaec6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -892,6 +892,7 @@ ProcessResult(PGresult **results)
 {
 	bool		success = true;
 	bool		first_cycle = true;
+	bool		is_copy_in = false;
 
 	for (;;)
 	{
@@ -1015,6 +1016,7 @@ ProcessResult(PGresult **results)
 									   copystream,
 									   PQbinaryTuples(*results),
 									   &copy_result) && success;
+				is_copy_in = true;
 			}
 			ResetCancelConn();
 
@@ -1045,6 +1047,11 @@ ProcessResult(PGresult **results)
 		first_cycle = false;
 	}
 
+	/* Print returned result  for COPY FROM with error_limit. */
+	if (is_copy_in && !success && PQresultStatus(*results) !=
+		PGRES_FATAL_ERROR)
+		(void) PrintQueryTuples(*results);
+
 	SetResultVariables(*results, success);
 
 	/* may need this to recover from conn loss during COPY */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..addd8054d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -25,7 +25,7 @@ typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
-				   uint64 *processed);
+				   uint64 *processed, DestReceiver *dest);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
 extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
@@ -37,7 +37,7 @@ extern bool NextCopyFromRawFields(CopyState cstate,
 								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 
-extern uint64 CopyFrom(CopyState cstate);
+extern uint64 CopyFrom(CopyState cstate, DestReceiver *dest);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d25a..fbffe1d1ea 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,5 +1,5 @@
 CREATE TEMP TABLE x (
-	a serial,
+	a serial UNIQUE,
 	b int,
 	c text not null default 'stuff',
 	d text,
@@ -55,6 +55,16 @@ LINE 1: COPY x TO stdout WHERE a = 1;
                          ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001	22	32" --- missing data for column "d"
+WARNING:  skipping "70002	23	33	43	53	54" --- extra data after last expected column
+WARNING:  skipping "70003	24	34	44" --- missing data for column "e"
+
+   a   | b  | c  | d  |          e           
+-------+----+----+----+----------------------
+ 70005 | 27 | 37 | 47 | before trigger fired
+(1 row)
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist
 LINE 1: COPY x from stdin WHERE f > 60003;
@@ -102,12 +112,14 @@ SELECT * FROM x;
  50004 | 25 | 35         | 45     | before trigger fired
  60004 | 25 | 35         | 45     | before trigger fired
  60005 | 26 | 36         | 46     | before trigger fired
+ 70004 | 25 | 35         | 45     | before trigger fired
+ 70005 | 26 | 36         | 46     | before trigger fired
      1 |  1 | stuff      | test_1 | after trigger fired
      2 |  2 | stuff      | test_2 | after trigger fired
      3 |  3 | stuff      | test_3 | after trigger fired
      4 |  4 | stuff      | test_4 | after trigger fired
      5 |  5 | stuff      | test_5 | after trigger fired
-(28 rows)
+(30 rows)
 
 -- check copy out
 COPY x TO stdout;
@@ -134,6 +146,8 @@ COPY x TO stdout;
 50004	25	35	45	before trigger fired
 60004	25	35	45	before trigger fired
 60005	26	36	46	before trigger fired
+70004	25	35	45	before trigger fired
+70005	26	36	46	before trigger fired
 1	1	stuff	test_1	after trigger fired
 2	2	stuff	test_2	after trigger fired
 3	3	stuff	test_3	after trigger fired
@@ -163,6 +177,8 @@ Delimiter	before trigger fired
 35	before trigger fired
 35	before trigger fired
 36	before trigger fired
+35	before trigger fired
+36	before trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
 stuff	after trigger fired
@@ -192,6 +208,8 @@ I'm null	before trigger fired
 25	before trigger fired
 25	before trigger fired
 26	before trigger fired
+25	before trigger fired
+26	before trigger fired
 1	after trigger fired
 2	after trigger fired
 3	after trigger fired
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fac19..e9b8855d87 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -1,5 +1,5 @@
 CREATE TEMP TABLE x (
-	a serial,
+	a serial UNIQUE,
 	b int,
 	c text not null default 'stuff',
 	d text,
@@ -110,6 +110,15 @@ COPY x from stdin WHERE a > 60003;
 60005	26	36	46	56
 \.
 
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001	22	32
+70002	23	33	43	53	54
+70003	24	34	44
+70004	25	35	45	55
+70005	26	36	46	56
+70005	27	37	47	57
+\.
+
 COPY x from stdin WHERE f > 60003;
 
 COPY x from stdin WHERE a = max(x.b);

#50

almost 6 years ago

In reply to: Surafel Temesgen (#49)

RE: Conflict handling for COPY FROM

Hello Surafel,

From: Surafel Temesgen <surafel3000@gmail.com>

An error that can be surly handled without transaction rollback can
be included in error handling but i will like to proceed without binary file
errors handling for the time being

Thank you.

Also it seems that you apply Alexey's comment.
So I'll mark this patch as ready for commiter.

Regards,

--
Takanori Asaba

#51

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Surafel Temesgen (#49)

Re: Conflict handling for COPY FROM

Surafel Temesgen <surafel3000@gmail.com> writes:

[ conflict-handling-copy-from-v16.patch ]

I took a quick look at this patch, since it was marked "ready for
committer", but I don't see how it can possibly be considered committable.

1. Covering only the errors that are thrown in DoCopy itself doesn't
seem to me to pass the smell test. Yes, I'm sure there's some set of
use-cases for which that'd be helpful, but I think most people would
expect a "skip errors" option to be able to handle cases like malformed
numbers or bad encoding. I understand the implementation reasons that
make it impractical to cover other errors, but do we really want a
feature that most people will see as much less than half-baked? I fear
it'll be an embarrassment.

2. If I'm reading the patch correctly, (some) rejected rows are actually
sent back to the client. This is a wire protocol break of the first
magnitude, and can NOT be accepted. At least not without some provisions
for not doing it with a client that isn't prepared for it. I also am
fairly worried about the possibilities for deadlock (ie, both ends stuck
waiting for the other one to absorb data) if the return traffic volume is
high enough.

3. I don't think enough thought has been put into the reporting, either.

+                ereport(WARNING,
+                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                         errmsg("skipping \"%s\" --- extra data after last expected column",
+                                cstate->line_buf.data)));

That's not going to be terribly helpful if the input line runs to many
megabytes. Or even if no individual line is very long, but you get
millions of such warnings. It's pretty much at variance with our
message style guidelines (among other things, those caution you to keep
the primary error message short); and it's randomly different from
COPY's existing practice, which is to show the faulty line as CONTEXT.
Plus it seems plenty weird that some errors are reported this way while
others are reported by sending back the bad tuple (with, it looks like,
no mention of what the specific problem is ... what if you have a lot of
unique indexes?).

BTW, while I don't know much about the ON CONFLICT (speculative
insertion) infrastructure, I wonder how well it really works to
not specify an arbiter index. I see that you're getting away with
it in a trivial test case that has exactly one index, but that's
not stressing the point very hard.

On the whole, I feel like adding this sort of functionality to COPY
itself is a dead end. COPY is meant for fast bulk transfer and not
much else; trying to load more functionality onto it can only end
in serving multiple masters poorly. What we normally recommend
if you have data that needs to be cleaned is to import it into a
permissively-defined staging table (eg, with all columns declared
as text) and then transfer cleaned data to your tables-of-record.
Looking at this patch in terms of whether the functionality is
available in that approach, it seems like you might want two parts
of it:

1. A COPY option to be flexible about the number of columns in the
input, say by filling omitted columns with NULLs.

2. INSERT ... ON CONFLICT can be used to transfer data to permanent
tables with rejection of duplicate keys, but it doesn't have much
flexibility about just what to do with duplicates. Maybe we could
add more ON CONFLICT actions? Sending conflicted rows to some other
table, or updating the source table to show which rows were copied
and which not, might be useful things to think about.

regards, tom lane

#52