[POC] Fast COPY FROM command for the table with foreign partitions

Started by Andrey Lepikhovover 5 years ago132 messages
#1Andrey Lepikhov
a.lepikhov@postgrespro.ru
1 attachment(s)

Hi, hackers!

Currently i see, COPY FROM insertion into the partitioned table with
foreign partitions is not optimal: even if table constraints allows can
do multi insert copy, we will flush the buffers and prepare new INSERT
query for each tuple, routed into the foreign partition.
To solve this problem i tried to use the multi insert buffers for
foreign tuples too. Flushing of these buffers performs by the analogy
with 'COPY .. FROM STDIN' machinery as it is done by the psql '\copy'
command.
The patch in attachment was prepared from the private scratch developed
by Arseny Sher a couple of years ago.
Benchmarks shows that it speeds up COPY FROM operation:
Command "COPY pgbench_accounts FROM ..." (test file contains 1e7 tuples,
copy to three partitions) executes on my laptop in 14 minutes without
the patch and in 1.5 minutes with the patch. Theoretical minimum here
(with infinite buffer size) is 40 seconds.

A couple of questions:
1. Can this feature be interesting for the PostgreSQL core or not?
2. If this is a useful feature, is the correct way chosen?

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachments:

0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 6cc545a00f6f6b18926602880819eeed9313550b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 29 May 2020 10:39:57 +0500
Subject: [PATCH] Fast COPY FROM into the foreign (or sharded) table.

---
 contrib/postgres_fdw/deparse.c                |  25 ++
 .../postgres_fdw/expected/postgres_fdw.out    |   5 +-
 contrib/postgres_fdw/postgres_fdw.c           |  95 ++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 src/backend/commands/copy.c                   | 216 ++++++++++++------
 src/include/commands/copy.h                   |   5 +
 src/include/foreign/fdwapi.h                  |   9 +
 7 files changed, 289 insertions(+), 67 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..427402c8eb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1758,6 +1758,31 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	int attnum;
+
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, " ( ");
+
+	for(attnum = 0; attnum < rel->rd_att->natts; attnum++)
+	{
+		appendStringInfoString(buf, NameStr(rel->rd_att->attrs[attnum].attname));
+
+		if (attnum != rel->rd_att->natts-1)
+			appendStringInfoString(buf, ", ");
+	}
+
+	appendStringInfoString(buf, " ) ");
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..5ae24fef7c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2 ( f1, f2 )  FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..bd2a8f596f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState fdwcstate;
 } PgFdwModifyState;
 
 /*
@@ -350,12 +352,16 @@ static TupleTableSlot *postgresExecForeignDelete(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+												 TupleTableSlot *slot);
 static void postgresEndForeignModify(EState *estate,
 									 ResultRelInfo *resultRelInfo);
 static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -530,9 +536,12 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->ExecForeignInsert = postgresExecForeignInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -1890,6 +1899,27 @@ postgresExecForeignDelete(EState *estate,
 								  slot, planSlot);
 }
 
+/*
+ * postgresExecForeignCopy
+ *		Copy one row into a foreign table
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo, TupleTableSlot *slot)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	char *buf;
+
+	buf = NextForeignCopyRow(fmstate->fdwcstate, slot);
+
+	if (PQputCopyData(fmstate->conn, buf, strlen(buf)) <= 0)
+	{
+		PGresult *res;
+
+		res = PQgetResult(fmstate->conn);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+	}
+}
 /*
  * postgresEndForeignModify
  *		Finish an insert/update/delete operation on a foreign table
@@ -2051,6 +2081,71 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ResultRelInfo *resultRelInfo)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) (resultRelInfo->ri_FdwState);
+	StringInfoData sql;
+	PGresult *res;
+
+	Assert(resultRelInfo->ri_FdwRoutine != NULL);
+
+	fmstate->target_attrs = NULL;
+	fmstate->has_returning = false;
+	fmstate->retrieved_attrs = NULL;
+
+	if (fmstate->fdwcstate == NULL)
+		fmstate->fdwcstate = BeginForeignCopyTo(rel);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+	fmstate->query = sql.data;
+
+	res = PQexec(fmstate->conn, fmstate->query);
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	PGresult *res;
+
+	Assert(fmstate != NULL);
+
+	if (!status)
+	{
+		PQputCopyEnd(fmstate->conn, (PQprotocolVersion(fmstate->conn) < 3) ?
+					NULL :
+					_("aborted foreign copy"));
+		pfree(fmstate->fdwcstate);
+		fmstate->fdwcstate = NULL;
+		EndForeignCopyTo(fmstate->fdwcstate);
+		return;
+	}
+
+	while (res = PQgetResult(fmstate->conn), PQresultStatus(res) == PGRES_COPY_IN)
+	{
+		/* We can't send an error message if we're using protocol version 2 */
+		PQputCopyEnd(fmstate->conn, (status || PQprotocolVersion(fmstate->conn) < 3) ? NULL :
+					 _("aborted foreign copy"));
+		PQclear(res);
+	}
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+
+	while (PQgetResult(fmstate->conn) != NULL);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..9459e031c7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -133,6 +133,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -358,8 +359,11 @@ static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
 static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
 static void EndCopyTo(CopyState cstate);
+static void CopyToStart(CopyState cstate);
+static void CopyToFinish(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
 static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
@@ -587,7 +591,9 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			CopySendChar(cstate, '\0');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1075,7 +1081,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	else
 	{
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1815,6 +1821,32 @@ EndCopy(CopyState cstate)
 	pfree(cstate);
 }
 
+static char *buf = NULL;
+static void
+data_dest_cb(void *outbuf, int len)
+{
+	buf = (char *) palloc(len);
+	memcpy(buf, (char *) outbuf, len);
+}
+
+CopyState
+BeginForeignCopyTo(Relation rel)
+{
+	CopyState cstate;
+
+	cstate = BeginCopy(NULL, false, rel, NULL, InvalidOid, NIL, NIL);
+	cstate->copy_dest = COPY_CALLBACK;
+	cstate->data_dest_cb = data_dest_cb;
+	CopyToStart(cstate);
+	return cstate;
+}
+
+void
+EndForeignCopyTo(CopyState cstate)
+{
+	CopyToFinish(cstate);
+}
+
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
@@ -1825,6 +1857,7 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
@@ -1880,6 +1913,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -1950,6 +1988,13 @@ BeginCopyTo(ParseState *pstate,
 	return cstate;
 }
 
+char *
+NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot)
+{
+	CopyOneRowTo(cstate, slot);
+	return buf;
+}
+
 /*
  * This intermediate routine exists mainly to localize the effects of setjmp
  * so we don't need to plaster a lot of variables with "volatile".
@@ -1966,7 +2011,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2005,16 +2052,12 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
- */
-static uint64
-CopyTo(CopyState cstate)
+static void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -2104,6 +2147,29 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+static void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2135,17 +2201,6 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
@@ -2449,53 +2504,85 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
-		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
-		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		/* Flush into foreign table or partition */
+		int i;
+		bool status = false;
+		CopyState fcstate = NULL;
+
+		Assert(resultRelInfo->ri_FdwRoutine != NULL &&
+			   resultRelInfo->ri_FdwState != NULL);
+
+		PG_TRY();
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			fcstate = resultRelInfo->ri_FdwRoutine->BeginForeignCopy(resultRelInfo);
+			for (i = 0; i < nused; i++)
+				resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+															  fcstate,
+															  slots[i]);
+			status = true;
 		}
-
+		PG_FINALLY();
+		{
+			Assert(fcstate != NULL);
+			resultRelInfo->ri_FdwRoutine->EndForeignCopy(
+														buffer->resultRelInfo,
+														status);
+		}
+		PG_END_TRY();
+	}
+	else
+	{
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2868,8 +2955,7 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
 		 * Can't support multi-inserts to foreign tables or if there are any
@@ -3037,8 +3123,7 @@ CopyFrom(CopyState cstate)
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					!has_instead_insert_row_trig;
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -3048,7 +3133,8 @@ CopyFrom(CopyState cstate)
 													   resultRelInfo);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo) &&
+						 resultRelInfo->ri_FdwRoutine == NULL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..ef119a761a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -41,4 +42,8 @@ extern uint64 CopyFrom(CopyState cstate);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);
+
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..197301c5a5 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -94,6 +94,8 @@ typedef TupleTableSlot *(*ExecForeignDelete_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot *slot);
 
 typedef void (*EndForeignModify_function) (EState *estate,
 										   ResultRelInfo *rinfo);
@@ -104,6 +106,10 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (ResultRelInfo *rinfo, bool status);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -211,9 +217,12 @@ typedef struct FdwRoutine
 	ExecForeignInsert_function ExecForeignInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
+	ExecForeignCopy_function ExecForeignCopy;
 	EndForeignModify_function EndForeignModify;
 	BeginForeignInsert_function BeginForeignInsert;
 	EndForeignInsert_function EndForeignInsert;
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
 	IsForeignRelUpdatable_function IsForeignRelUpdatable;
 	PlanDirectModify_function PlanDirectModify;
 	BeginDirectModify_function BeginDirectModify;
-- 
2.17.1

#2Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#1)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

On Mon, Jun 1, 2020 at 6:29 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Currently i see, COPY FROM insertion into the partitioned table with
foreign partitions is not optimal: even if table constraints allows can
do multi insert copy, we will flush the buffers and prepare new INSERT
query for each tuple, routed into the foreign partition.
To solve this problem i tried to use the multi insert buffers for
foreign tuples too. Flushing of these buffers performs by the analogy
with 'COPY .. FROM STDIN' machinery as it is done by the psql '\copy'
command.
The patch in attachment was prepared from the private scratch developed
by Arseny Sher a couple of years ago.
Benchmarks shows that it speeds up COPY FROM operation:
Command "COPY pgbench_accounts FROM ..." (test file contains 1e7 tuples,
copy to three partitions) executes on my laptop in 14 minutes without
the patch and in 1.5 minutes with the patch. Theoretical minimum here
(with infinite buffer size) is 40 seconds.

Great!

A couple of questions:
1. Can this feature be interesting for the PostgreSQL core or not?

Yeah, I think this is especially useful for sharding.

2. If this is a useful feature, is the correct way chosen?

I think I also thought something similar to this before [1]/messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp. Will take a look.

Thanks!

Best regards,
Etsuro Fujita

[1]: /messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp

#3Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#2)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Thank you for the answer,

02.06.2020 05:02, Etsuro Fujita пишет:

I think I also thought something similar to this before [1]. Will take a look.

[1] /messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp

I have looked into the thread.
My first version of the patch was like your idea. But when developing
the “COPY FROM” code, the following features were discovered:
1. Two or more partitions can be placed at the same node. We need to
finish COPY into one partition before start COPY into another partition
at the same node.
2. On any error we need to send EOF to all started "COPY .. FROM STDIN"
operations. Otherwise FDW can't cancel operation.

Hiding the COPY code under the buffers management machinery allows us to
generalize buffers machinery, execute one COPY operation on each buffer
and simplify error handling.

As i understand, main idea of the thread, mentioned by you, is to add
"COPY FROM" support without changes in FDW API.
It is possible to remove BeginForeignCopy() and EndForeignCopy() from
the patch. But it is not trivial to change ExecForeignInsert() for the
COPY purposes.
All that I can offer in this place now is to introduce one new
ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM
STDIN" operation, send tuples and close the operation. We can use the
ExecForeignInsert() routine for each buffer tuple if
ExecForeignBulkInsert() is not supported.

One of main questions here is to use COPY TO machinery for serializing a
tuple. It is needed (if you will take a look into the patch) to
transform the CopyTo() routine to an iterative representation:
start/next/finish. May it be acceptable?

In the attachment there is a patch with the correction of a stupid error.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachments:

0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 92a86ca52e33444ef2f559a1da9c8632398892a4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 29 May 2020 10:39:57 +0500
Subject: [PATCH] Fast COPY FROM into the foreign (or sharded) table.

---
 contrib/postgres_fdw/deparse.c                |  25 ++
 .../postgres_fdw/expected/postgres_fdw.out    |   5 +-
 contrib/postgres_fdw/postgres_fdw.c           |  95 ++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 src/backend/commands/copy.c                   | 213 ++++++++++++------
 src/include/commands/copy.h                   |   5 +
 src/include/foreign/fdwapi.h                  |   9 +
 7 files changed, 286 insertions(+), 67 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..427402c8eb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1758,6 +1758,31 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	int attnum;
+
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, " ( ");
+
+	for(attnum = 0; attnum < rel->rd_att->natts; attnum++)
+	{
+		appendStringInfoString(buf, NameStr(rel->rd_att->attrs[attnum].attname));
+
+		if (attnum != rel->rd_att->natts-1)
+			appendStringInfoString(buf, ", ");
+	}
+
+	appendStringInfoString(buf, " ) ");
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..5ae24fef7c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2 ( f1, f2 )  FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..bd2a8f596f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState fdwcstate;
 } PgFdwModifyState;
 
 /*
@@ -350,12 +352,16 @@ static TupleTableSlot *postgresExecForeignDelete(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+												 TupleTableSlot *slot);
 static void postgresEndForeignModify(EState *estate,
 									 ResultRelInfo *resultRelInfo);
 static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -530,9 +536,12 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->ExecForeignInsert = postgresExecForeignInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -1890,6 +1899,27 @@ postgresExecForeignDelete(EState *estate,
 								  slot, planSlot);
 }
 
+/*
+ * postgresExecForeignCopy
+ *		Copy one row into a foreign table
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo, TupleTableSlot *slot)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	char *buf;
+
+	buf = NextForeignCopyRow(fmstate->fdwcstate, slot);
+
+	if (PQputCopyData(fmstate->conn, buf, strlen(buf)) <= 0)
+	{
+		PGresult *res;
+
+		res = PQgetResult(fmstate->conn);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+	}
+}
 /*
  * postgresEndForeignModify
  *		Finish an insert/update/delete operation on a foreign table
@@ -2051,6 +2081,71 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ResultRelInfo *resultRelInfo)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) (resultRelInfo->ri_FdwState);
+	StringInfoData sql;
+	PGresult *res;
+
+	Assert(resultRelInfo->ri_FdwRoutine != NULL);
+
+	fmstate->target_attrs = NULL;
+	fmstate->has_returning = false;
+	fmstate->retrieved_attrs = NULL;
+
+	if (fmstate->fdwcstate == NULL)
+		fmstate->fdwcstate = BeginForeignCopyTo(rel);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+	fmstate->query = sql.data;
+
+	res = PQexec(fmstate->conn, fmstate->query);
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	PGresult *res;
+
+	Assert(fmstate != NULL);
+
+	if (!status)
+	{
+		PQputCopyEnd(fmstate->conn, (PQprotocolVersion(fmstate->conn) < 3) ?
+					NULL :
+					_("aborted foreign copy"));
+		pfree(fmstate->fdwcstate);
+		fmstate->fdwcstate = NULL;
+		EndForeignCopyTo(fmstate->fdwcstate);
+		return;
+	}
+
+	while (res = PQgetResult(fmstate->conn), PQresultStatus(res) == PGRES_COPY_IN)
+	{
+		/* We can't send an error message if we're using protocol version 2 */
+		PQputCopyEnd(fmstate->conn, (status || PQprotocolVersion(fmstate->conn) < 3) ? NULL :
+					 _("aborted foreign copy"));
+		PQclear(res);
+	}
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+
+	while (PQgetResult(fmstate->conn) != NULL);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..87e0f46846 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -133,6 +133,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -358,8 +359,11 @@ static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
 static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
 static void EndCopyTo(CopyState cstate);
+static void CopyToStart(CopyState cstate);
+static void CopyToFinish(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
 static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
@@ -587,7 +591,9 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			CopySendChar(cstate, '\0');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1075,7 +1081,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	else
 	{
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1815,6 +1821,32 @@ EndCopy(CopyState cstate)
 	pfree(cstate);
 }
 
+static char *buf = NULL;
+static void
+data_dest_cb(void *outbuf, int len)
+{
+	buf = (char *) palloc(len);
+	memcpy(buf, (char *) outbuf, len);
+}
+
+CopyState
+BeginForeignCopyTo(Relation rel)
+{
+	CopyState cstate;
+
+	cstate = BeginCopy(NULL, false, rel, NULL, InvalidOid, NIL, NIL);
+	cstate->copy_dest = COPY_CALLBACK;
+	cstate->data_dest_cb = data_dest_cb;
+	CopyToStart(cstate);
+	return cstate;
+}
+
+void
+EndForeignCopyTo(CopyState cstate)
+{
+	CopyToFinish(cstate);
+}
+
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
@@ -1825,6 +1857,7 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
@@ -1880,6 +1913,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -1950,6 +1988,13 @@ BeginCopyTo(ParseState *pstate,
 	return cstate;
 }
 
+char *
+NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot)
+{
+	CopyOneRowTo(cstate, slot);
+	return buf;
+}
+
 /*
  * This intermediate routine exists mainly to localize the effects of setjmp
  * so we don't need to plaster a lot of variables with "volatile".
@@ -1966,7 +2011,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2005,16 +2052,12 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
- */
-static uint64
-CopyTo(CopyState cstate)
+static void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -2104,6 +2147,29 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+static void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2135,17 +2201,6 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
@@ -2449,53 +2504,82 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
-		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
-		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		/* Flush into foreign table or partition */
+		int i;
+		bool status = false;
+
+		Assert(resultRelInfo->ri_FdwRoutine != NULL &&
+			   resultRelInfo->ri_FdwState != NULL);
+
+		PG_TRY();
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			resultRelInfo->ri_FdwRoutine->BeginForeignCopy(resultRelInfo);
+			for (i = 0; i < nused; i++)
+				resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+															  slots[i]);
+			status = true;
 		}
-
+		PG_FINALLY();
+		{
+			resultRelInfo->ri_FdwRoutine->EndForeignCopy(
+														buffer->resultRelInfo,
+														status);
+		}
+		PG_END_TRY();
+	}
+	else
+	{
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2868,8 +2952,7 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
 		 * Can't support multi-inserts to foreign tables or if there are any
@@ -3037,8 +3120,7 @@ CopyFrom(CopyState cstate)
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					!has_instead_insert_row_trig;
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -3048,7 +3130,8 @@ CopyFrom(CopyState cstate)
 													   resultRelInfo);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo) &&
+						 resultRelInfo->ri_FdwRoutine == NULL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..ef119a761a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -41,4 +42,8 @@ extern uint64 CopyFrom(CopyState cstate);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);
+
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..197301c5a5 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -94,6 +94,8 @@ typedef TupleTableSlot *(*ExecForeignDelete_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot *slot);
 
 typedef void (*EndForeignModify_function) (EState *estate,
 										   ResultRelInfo *rinfo);
@@ -104,6 +106,10 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (ResultRelInfo *rinfo, bool status);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -211,9 +217,12 @@ typedef struct FdwRoutine
 	ExecForeignInsert_function ExecForeignInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
+	ExecForeignCopy_function ExecForeignCopy;
 	EndForeignModify_function EndForeignModify;
 	BeginForeignInsert_function BeginForeignInsert;
 	EndForeignInsert_function EndForeignInsert;
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
 	IsForeignRelUpdatable_function IsForeignRelUpdatable;
 	PlanDirectModify_function PlanDirectModify;
 	BeginDirectModify_function BeginDirectModify;
-- 
2.17.1

#4Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Andrey Lepikhov (#3)
2 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Thanks Andrey for the patch. I am glad that the patch has taken care
of some corner cases already but there exist still more.

COPY command constructed doesn't take care of dropped columns. There
is code in deparseAnalyzeSql which constructs list of columns for a
given foreign relation. 0002 patch attached here, moves that code to a
separate function and reuses it for COPY. If you find that code change
useful please include it in the main patch.

While working on that, I found two issues
1. The COPY command constructed an empty columns list when there were
no non-dropped columns in the relation. This caused a syntax error.
Fixed that in 0002.
2. In the same case, if the foreign table declared locally didn't have
any non-dropped columns but the relation that it referred to on the
foreign server had some non-dropped columns, COPY command fails. I
added a test case for this in 0002 but haven't fixed it.

I think this work is useful. Please add it to the next commitfest so
that it's tracked.

On Tue, Jun 2, 2020 at 11:21 AM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Thank you for the answer,

02.06.2020 05:02, Etsuro Fujita пишет:

I think I also thought something similar to this before [1]. Will take a look.

[1] /messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp

I have looked into the thread.
My first version of the patch was like your idea. But when developing
the “COPY FROM” code, the following features were discovered:
1. Two or more partitions can be placed at the same node. We need to
finish COPY into one partition before start COPY into another partition
at the same node.
2. On any error we need to send EOF to all started "COPY .. FROM STDIN"
operations. Otherwise FDW can't cancel operation.

Hiding the COPY code under the buffers management machinery allows us to
generalize buffers machinery, execute one COPY operation on each buffer
and simplify error handling.

As i understand, main idea of the thread, mentioned by you, is to add
"COPY FROM" support without changes in FDW API.
It is possible to remove BeginForeignCopy() and EndForeignCopy() from
the patch. But it is not trivial to change ExecForeignInsert() for the
COPY purposes.
All that I can offer in this place now is to introduce one new
ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM
STDIN" operation, send tuples and close the operation. We can use the
ExecForeignInsert() routine for each buffer tuple if
ExecForeignBulkInsert() is not supported.

One of main questions here is to use COPY TO machinery for serializing a
tuple. It is needed (if you will take a look into the patch) to
transform the CopyTo() routine to an iterative representation:
start/next/finish. May it be acceptable?

In the attachment there is a patch with the correction of a stupid error.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=US-ASCII; name=0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 9c4e09bd03cb98b1f84c42c34ce7b76e0a87011c Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 29 May 2020 10:39:57 +0500
Subject: [PATCH 1/2] Fast COPY FROM into the foreign (or sharded) table.

---
 contrib/postgres_fdw/deparse.c                |  25 ++
 .../postgres_fdw/expected/postgres_fdw.out    |   5 +-
 contrib/postgres_fdw/postgres_fdw.c           |  95 ++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 src/backend/commands/copy.c                   | 213 ++++++++++++------
 src/include/commands/copy.h                   |   5 +
 src/include/foreign/fdwapi.h                  |   9 +
 7 files changed, 286 insertions(+), 67 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..427402c8eb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1758,6 +1758,31 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	int attnum;
+
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, " ( ");
+
+	for(attnum = 0; attnum < rel->rd_att->natts; attnum++)
+	{
+		appendStringInfoString(buf, NameStr(rel->rd_att->attrs[attnum].attname));
+
+		if (attnum != rel->rd_att->natts-1)
+			appendStringInfoString(buf, ", ");
+	}
+
+	appendStringInfoString(buf, " ) ");
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..922c08d2dc 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2 ( f1, f2 )  FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..bd2a8f596f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState fdwcstate;
 } PgFdwModifyState;
 
 /*
@@ -350,12 +352,16 @@ static TupleTableSlot *postgresExecForeignDelete(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+												 TupleTableSlot *slot);
 static void postgresEndForeignModify(EState *estate,
 									 ResultRelInfo *resultRelInfo);
 static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -530,9 +536,12 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->ExecForeignInsert = postgresExecForeignInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -1890,6 +1899,27 @@ postgresExecForeignDelete(EState *estate,
 								  slot, planSlot);
 }
 
+/*
+ * postgresExecForeignCopy
+ *		Copy one row into a foreign table
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo, TupleTableSlot *slot)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	char *buf;
+
+	buf = NextForeignCopyRow(fmstate->fdwcstate, slot);
+
+	if (PQputCopyData(fmstate->conn, buf, strlen(buf)) <= 0)
+	{
+		PGresult *res;
+
+		res = PQgetResult(fmstate->conn);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+	}
+}
 /*
  * postgresEndForeignModify
  *		Finish an insert/update/delete operation on a foreign table
@@ -2051,6 +2081,71 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ResultRelInfo *resultRelInfo)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) (resultRelInfo->ri_FdwState);
+	StringInfoData sql;
+	PGresult *res;
+
+	Assert(resultRelInfo->ri_FdwRoutine != NULL);
+
+	fmstate->target_attrs = NULL;
+	fmstate->has_returning = false;
+	fmstate->retrieved_attrs = NULL;
+
+	if (fmstate->fdwcstate == NULL)
+		fmstate->fdwcstate = BeginForeignCopyTo(rel);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+	fmstate->query = sql.data;
+
+	res = PQexec(fmstate->conn, fmstate->query);
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	PGresult *res;
+
+	Assert(fmstate != NULL);
+
+	if (!status)
+	{
+		PQputCopyEnd(fmstate->conn, (PQprotocolVersion(fmstate->conn) < 3) ?
+					NULL :
+					_("aborted foreign copy"));
+		pfree(fmstate->fdwcstate);
+		fmstate->fdwcstate = NULL;
+		EndForeignCopyTo(fmstate->fdwcstate);
+		return;
+	}
+
+	while (res = PQgetResult(fmstate->conn), PQresultStatus(res) == PGRES_COPY_IN)
+	{
+		/* We can't send an error message if we're using protocol version 2 */
+		PQputCopyEnd(fmstate->conn, (status || PQprotocolVersion(fmstate->conn) < 3) ? NULL :
+					 _("aborted foreign copy"));
+		PQclear(res);
+	}
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+
+	while (PQgetResult(fmstate->conn) != NULL);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..87e0f46846 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -133,6 +133,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -358,8 +359,11 @@ static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
 static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
 static void EndCopyTo(CopyState cstate);
+static void CopyToStart(CopyState cstate);
+static void CopyToFinish(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
 static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
@@ -587,7 +591,9 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			CopySendChar(cstate, '\0');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1075,7 +1081,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	else
 	{
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1815,6 +1821,32 @@ EndCopy(CopyState cstate)
 	pfree(cstate);
 }
 
+static char *buf = NULL;
+static void
+data_dest_cb(void *outbuf, int len)
+{
+	buf = (char *) palloc(len);
+	memcpy(buf, (char *) outbuf, len);
+}
+
+CopyState
+BeginForeignCopyTo(Relation rel)
+{
+	CopyState cstate;
+
+	cstate = BeginCopy(NULL, false, rel, NULL, InvalidOid, NIL, NIL);
+	cstate->copy_dest = COPY_CALLBACK;
+	cstate->data_dest_cb = data_dest_cb;
+	CopyToStart(cstate);
+	return cstate;
+}
+
+void
+EndForeignCopyTo(CopyState cstate)
+{
+	CopyToFinish(cstate);
+}
+
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
@@ -1825,6 +1857,7 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
@@ -1880,6 +1913,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -1950,6 +1988,13 @@ BeginCopyTo(ParseState *pstate,
 	return cstate;
 }
 
+char *
+NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot)
+{
+	CopyOneRowTo(cstate, slot);
+	return buf;
+}
+
 /*
  * This intermediate routine exists mainly to localize the effects of setjmp
  * so we don't need to plaster a lot of variables with "volatile".
@@ -1966,7 +2011,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2005,16 +2052,12 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
- */
-static uint64
-CopyTo(CopyState cstate)
+static void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -2104,6 +2147,29 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+static void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2135,17 +2201,6 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
@@ -2449,53 +2504,82 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
-		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
-		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		/* Flush into foreign table or partition */
+		int i;
+		bool status = false;
+
+		Assert(resultRelInfo->ri_FdwRoutine != NULL &&
+			   resultRelInfo->ri_FdwState != NULL);
+
+		PG_TRY();
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			resultRelInfo->ri_FdwRoutine->BeginForeignCopy(resultRelInfo);
+			for (i = 0; i < nused; i++)
+				resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+															  slots[i]);
+			status = true;
 		}
-
+		PG_FINALLY();
+		{
+			resultRelInfo->ri_FdwRoutine->EndForeignCopy(
+														buffer->resultRelInfo,
+														status);
+		}
+		PG_END_TRY();
+	}
+	else
+	{
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2868,8 +2952,7 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
 		 * Can't support multi-inserts to foreign tables or if there are any
@@ -3037,8 +3120,7 @@ CopyFrom(CopyState cstate)
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					!has_instead_insert_row_trig;
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -3048,7 +3130,8 @@ CopyFrom(CopyState cstate)
 													   resultRelInfo);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo) &&
+						 resultRelInfo->ri_FdwRoutine == NULL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..ef119a761a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -41,4 +42,8 @@ extern uint64 CopyFrom(CopyState cstate);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);
+
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..197301c5a5 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -94,6 +94,8 @@ typedef TupleTableSlot *(*ExecForeignDelete_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot *slot);
 
 typedef void (*EndForeignModify_function) (EState *estate,
 										   ResultRelInfo *rinfo);
@@ -104,6 +106,10 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (ResultRelInfo *rinfo, bool status);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -211,9 +217,12 @@ typedef struct FdwRoutine
 	ExecForeignInsert_function ExecForeignInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
+	ExecForeignCopy_function ExecForeignCopy;
 	EndForeignModify_function EndForeignModify;
 	BeginForeignInsert_function BeginForeignInsert;
 	EndForeignInsert_function EndForeignInsert;
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
 	IsForeignRelUpdatable_function IsForeignRelUpdatable;
 	PlanDirectModify_function PlanDirectModify;
 	BeginDirectModify_function BeginDirectModify;
-- 
2.17.1

0002-Separate-code-to-list-all-columns-of-a-foreign-relat.patchtext/x-patch; charset=US-ASCII; name=0002-Separate-code-to-list-all-columns-of-a-foreign-relat.patchDownload
From 79b37b9160572a9d730c97d9fc1f471064b26c7e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
Date: Mon, 15 Jun 2020 10:43:05 +0530
Subject: [PATCH 2/2] Separate code to list all columns of a foreign relation
 to in its own function

This code resides in deparseAnalyzeSql() but is useful for COPY as well.
So separate it into its own function. This takes care of any dropped
columns or no columns cases.

However COPY command constructed for a relation which doesn't have any
non-dropped columns locally but has some columns on the remote server
has a syntax error. This needs to be fixed.

Ashutosh Bapat
---
 contrib/postgres_fdw/deparse.c                | 58 +++++++++++--------
 .../postgres_fdw/expected/postgres_fdw.out    | 16 ++++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 19 ++++++
 3 files changed, 69 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 427402c8eb..aa06342bfa 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1769,17 +1771,7 @@ deparseCopyFromSql(StringInfo buf, Relation rel)
 
 	appendStringInfoString(buf, "COPY ");
 	deparseRelation(buf, rel);
-	appendStringInfoString(buf, " ( ");
-
-	for(attnum = 0; attnum < rel->rd_att->natts; attnum++)
-	{
-		appendStringInfoString(buf, NameStr(rel->rd_att->attrs[attnum].attname));
-
-		if (attnum != rel->rd_att->natts-1)
-			appendStringInfoString(buf, ", ");
-	}
-
-	appendStringInfoString(buf, " ) ");
+	(void) deparseRelColumnList(buf, rel, true);
 	appendStringInfoString(buf, " FROM STDIN ");
 }
 
@@ -2086,6 +2078,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2094,10 +2110,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2106,6 +2120,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2125,18 +2142,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 922c08d2dc..e4201a7779 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8064,7 +8064,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
-remote SQL command: COPY public.loc2 ( f1, f2 )  FROM STDIN 
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
 COPY rem2, line 2
 select * from rem2;
  f1 | f2  
@@ -8184,6 +8184,20 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+-- this will error because of data and column mismatch
+-- FIXME
+copy rem2 from stdin;
+select * from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..37f0c61183 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2293,6 +2293,25 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+-- this will error because of data and column mismatch
+-- FIXME
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
-- 
2.17.1

#5Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ashutosh Bapat (#4)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 6/15/20 10:26 AM, Ashutosh Bapat wrote:

Thanks Andrey for the patch. I am glad that the patch has taken care
of some corner cases already but there exist still more.

COPY command constructed doesn't take care of dropped columns. There
is code in deparseAnalyzeSql which constructs list of columns for a
given foreign relation. 0002 patch attached here, moves that code to a
separate function and reuses it for COPY. If you find that code change
useful please include it in the main patch.

Thanks, i included it.

2. In the same case, if the foreign table declared locally didn't have
any non-dropped columns but the relation that it referred to on the
foreign server had some non-dropped columns, COPY command fails. I
added a test case for this in 0002 but haven't fixed it.

I fixed it.
This is very special corner case. The problem was that COPY FROM does
not support semantics like the "INSERT INTO .. DEFAULT VALUES". To
simplify the solution, i switched off bulk copying for this case.

I think this work is useful. Please add it to the next commitfest so
that it's tracked.

Ok.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com

Attachments:

v2-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v2-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From abe4db0a5391735f7663daac81df579644a70fc3 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Wed, 17 Jun 2020 11:07:54 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  33 ++-
 contrib/postgres_fdw/postgres_fdw.c           |  98 ++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  28 +++
 src/backend/commands/copy.c                   | 223 ++++++++++++------
 src/include/commands/copy.h                   |   5 +
 src/include/foreign/fdwapi.h                  |   9 +
 8 files changed, 374 insertions(+), 83 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..3a3cca5047 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8183,6 +8184,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..45441f3441 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState fdwcstate;
 } PgFdwModifyState;
 
 /*
@@ -350,12 +352,16 @@ static TupleTableSlot *postgresExecForeignDelete(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+												 TupleTableSlot *slot);
 static void postgresEndForeignModify(EState *estate,
 									 ResultRelInfo *resultRelInfo);
 static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -530,9 +536,12 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->ExecForeignInsert = postgresExecForeignInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -1890,6 +1899,27 @@ postgresExecForeignDelete(EState *estate,
 								  slot, planSlot);
 }
 
+/*
+ * postgresExecForeignCopy
+ *		Copy one row into a foreign table
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo, TupleTableSlot *slot)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	char *buf;
+
+	buf = NextForeignCopyRow(fmstate->fdwcstate, slot);
+
+	if (PQputCopyData(fmstate->conn, buf, strlen(buf)) <= 0)
+	{
+		PGresult *res;
+
+		res = PQgetResult(fmstate->conn);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+	}
+}
 /*
  * postgresEndForeignModify
  *		Finish an insert/update/delete operation on a foreign table
@@ -2051,6 +2081,74 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ResultRelInfo *resultRelInfo)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) (resultRelInfo->ri_FdwState);
+	StringInfoData sql;
+	PGresult *res;
+
+	Assert(resultRelInfo->ri_FdwRoutine != NULL);
+
+	fmstate->target_attrs = NULL;
+	fmstate->has_returning = false;
+	fmstate->retrieved_attrs = NULL;
+
+	if (fmstate->fdwcstate == NULL)
+		fmstate->fdwcstate = BeginForeignCopyTo(rel);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+	fmstate->query = sql.data;
+
+	res = PQexec(fmstate->conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+	PQclear(res);
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	PGresult *res;
+
+	Assert(fmstate != NULL);
+
+	if (!status)
+	{
+		PQputCopyEnd(fmstate->conn, (PQprotocolVersion(fmstate->conn) < 3) ?
+					NULL :
+					_("aborted foreign copy"));
+		pfree(fmstate->fdwcstate);
+		fmstate->fdwcstate = NULL;
+		EndForeignCopyTo(fmstate->fdwcstate);
+		return;
+	}
+
+	while (res = PQgetResult(fmstate->conn), PQresultStatus(res) == PGRES_COPY_IN)
+	{
+		/* We can't send an error message if we're using protocol version 2 */
+		PQputCopyEnd(fmstate->conn, (status || PQprotocolVersion(fmstate->conn) < 3) ? NULL :
+					 _("aborted foreign copy"));
+		PQclear(res);
+	}
+
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, false, fmstate->query);
+
+	while (PQgetResult(fmstate->conn) != NULL);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..73f98a3152 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2293,6 +2293,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..1be164b3ca 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -133,6 +133,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -358,8 +359,11 @@ static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
 static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
 static void EndCopyTo(CopyState cstate);
+static void CopyToStart(CopyState cstate);
+static void CopyToFinish(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
 static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
@@ -587,7 +591,9 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			CopySendChar(cstate, '\0');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1075,7 +1081,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	else
 	{
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1815,6 +1821,32 @@ EndCopy(CopyState cstate)
 	pfree(cstate);
 }
 
+static char *buf = NULL;
+static void
+data_dest_cb(void *outbuf, int len)
+{
+	buf = (char *) palloc(len);
+	memcpy(buf, (char *) outbuf, len);
+}
+
+CopyState
+BeginForeignCopyTo(Relation rel)
+{
+	CopyState cstate;
+
+	cstate = BeginCopy(NULL, false, rel, NULL, InvalidOid, NIL, NIL);
+	cstate->copy_dest = COPY_CALLBACK;
+	cstate->data_dest_cb = data_dest_cb;
+	CopyToStart(cstate);
+	return cstate;
+}
+
+void
+EndForeignCopyTo(CopyState cstate)
+{
+	CopyToFinish(cstate);
+}
+
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
@@ -1825,6 +1857,7 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
@@ -1880,6 +1913,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -1950,6 +1988,13 @@ BeginCopyTo(ParseState *pstate,
 	return cstate;
 }
 
+char *
+NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot)
+{
+	CopyOneRowTo(cstate, slot);
+	return buf;
+}
+
 /*
  * This intermediate routine exists mainly to localize the effects of setjmp
  * so we don't need to plaster a lot of variables with "volatile".
@@ -1966,7 +2011,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2005,16 +2052,12 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
- */
-static uint64
-CopyTo(CopyState cstate)
+static void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -2104,6 +2147,29 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+static void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2135,17 +2201,6 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
@@ -2449,53 +2504,83 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
-		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
-		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		/* Flush into foreign table or partition */
+		int i;
+		bool status = false;
+
+		Assert(resultRelInfo->ri_FdwRoutine != NULL &&
+			   resultRelInfo->ri_FdwState != NULL);
+
+		resultRelInfo->ri_FdwRoutine->BeginForeignCopy(resultRelInfo);
+
+		PG_TRY();
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			for (i = 0; i < nused; i++)
+				resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+															  slots[i]);
+			status = true;
 		}
-
+		PG_FINALLY();
+		{
+			resultRelInfo->ri_FdwRoutine->EndForeignCopy(
+														buffer->resultRelInfo,
+														status);
+		}
+		PG_END_TRY();
+	}
+	else
+	{
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2868,14 +2953,14 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs || (resultRelInfo->ri_FdwRoutine != NULL &&
+			list_length(cstate->attnumlist) == 0))
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -3037,8 +3122,7 @@ CopyFrom(CopyState cstate)
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					!has_instead_insert_row_trig;
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -3048,7 +3132,8 @@ CopyFrom(CopyState cstate)
 													   resultRelInfo);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo) &&
+						 resultRelInfo->ri_FdwRoutine == NULL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..ef119a761a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -41,4 +42,8 @@ extern uint64 CopyFrom(CopyState cstate);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);
+
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..197301c5a5 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -94,6 +94,8 @@ typedef TupleTableSlot *(*ExecForeignDelete_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot *slot);
 
 typedef void (*EndForeignModify_function) (EState *estate,
 										   ResultRelInfo *rinfo);
@@ -104,6 +106,10 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (ResultRelInfo *rinfo, bool status);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -211,9 +217,12 @@ typedef struct FdwRoutine
 	ExecForeignInsert_function ExecForeignInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
+	ExecForeignCopy_function ExecForeignCopy;
 	EndForeignModify_function EndForeignModify;
 	BeginForeignInsert_function BeginForeignInsert;
 	EndForeignInsert_function EndForeignInsert;
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
 	IsForeignRelUpdatable_function IsForeignRelUpdatable;
 	PlanDirectModify_function PlanDirectModify;
 	BeginDirectModify_function BeginDirectModify;
-- 
2.25.1

#6Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#3)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Tue, Jun 2, 2020 at 2:51 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

02.06.2020 05:02, Etsuro Fujita пишет:

I think I also thought something similar to this before [1]. Will take a look.

I'm still reviewing the patch, but let me comment on it.

[1] /messages/by-id/23990375-45a6-5823-b0aa-a6a7a6a957f0@lab.ntt.co.jp

I have looked into the thread.
My first version of the patch was like your idea. But when developing
the “COPY FROM” code, the following features were discovered:
1. Two or more partitions can be placed at the same node. We need to
finish COPY into one partition before start COPY into another partition
at the same node.
2. On any error we need to send EOF to all started "COPY .. FROM STDIN"
operations. Otherwise FDW can't cancel operation.

Hiding the COPY code under the buffers management machinery allows us to
generalize buffers machinery, execute one COPY operation on each buffer
and simplify error handling.

I'm not sure that it's really a good idea that the bulk-insert API is
designed the way it's tightly coupled with the bulk-insert machinery
in the core, because 1) some FDWs might want to send tuples provided
by the core to the remote, one by one, without storing them in a
buffer, or 2) some other FDWs might want to store the tuples in the
buffer and send them in a lump as postgres_fdw in the proposed patch
but might want to do so independently of MAX_BUFFERED_TUPLES and/or
MAX_BUFFERED_BYTES defined in the bulk-insert machinery.

I agree that we would need special handling for cases you mentioned
above if we design this API based on something like the idea I
proposed in that thread.

As i understand, main idea of the thread, mentioned by you, is to add
"COPY FROM" support without changes in FDW API.

I don't think so; I think we should introduce new API for this feature
to keep the ExecForeignInsert() API simple.

All that I can offer in this place now is to introduce one new
ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM
STDIN" operation, send tuples and close the operation. We can use the
ExecForeignInsert() routine for each buffer tuple if
ExecForeignBulkInsert() is not supported.

Agreed.

One of main questions here is to use COPY TO machinery for serializing a
tuple. It is needed (if you will take a look into the patch) to
transform the CopyTo() routine to an iterative representation:
start/next/finish. May it be acceptable?

+1 for the general idea.

In the attachment there is a patch with the correction of a stupid error.

Thanks for the patch!

Sorry for the delay.

Best regards,
Etsuro Fujita

#7Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#6)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

19.06.2020 19:58, Etsuro Fujita пишет:

On Tue, Jun 2, 2020 at 2:51 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Hiding the COPY code under the buffers management machinery allows us to
generalize buffers machinery, execute one COPY operation on each buffer
and simplify error handling.

I'm not sure that it's really a good idea that the bulk-insert API is
designed the way it's tightly coupled with the bulk-insert machinery
in the core, because 1) some FDWs might want to send tuples provided
by the core to the remote, one by one, without storing them in a
buffer, or 2) some other FDWs might want to store the tuples in the
buffer and send them in a lump as postgres_fdw in the proposed patch
but might want to do so independently of MAX_BUFFERED_TUPLES and/or
MAX_BUFFERED_BYTES defined in the bulk-insert machinery.

I agree that we would need special handling for cases you mentioned
above if we design this API based on something like the idea I
proposed in that thread.

Agreed

As i understand, main idea of the thread, mentioned by you, is to add
"COPY FROM" support without changes in FDW API.

I don't think so; I think we should introduce new API for this feature
to keep the ExecForeignInsert() API simple.

Ok

All that I can offer in this place now is to introduce one new
ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM
STDIN" operation, send tuples and close the operation. We can use the
ExecForeignInsert() routine for each buffer tuple if
ExecForeignBulkInsert() is not supported.

Agreed.

In the next version (see attachment) of the patch i removed Begin/End
fdwapi routines. Now we have only the ExecForeignBulkInsert() routine.

--
Andrey Lepikhov
Postgres Professional

Attachments:

v3-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v3-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 108dc421cec88ab5afd092f40da3fa31b8fcfbc5 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 22 Jun 2020 10:28:42 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  33 ++-
 contrib/postgres_fdw/postgres_fdw.c           |  87 ++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  28 +++
 src/backend/commands/copy.c                   | 206 ++++++++++++------
 src/include/commands/copy.h                   |   5 +
 src/include/foreign/fdwapi.h                  |   8 +
 8 files changed, 344 insertions(+), 84 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..3a3cca5047 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8183,6 +8184,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..2b3d7d6dfb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState fdwcstate;
 } PgFdwModifyState;
 
 /*
@@ -350,6 +352,9 @@ static TupleTableSlot *postgresExecForeignDelete(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static void postgresExecForeignBulkInsert(ResultRelInfo *resultRelInfo,
+									TupleTableSlot **slots,
+									int nslots);
 static void postgresEndForeignModify(EState *estate,
 									 ResultRelInfo *resultRelInfo);
 static void postgresBeginForeignInsert(ModifyTableState *mtstate,
@@ -530,6 +535,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->ExecForeignInsert = postgresExecForeignInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
+	routine->ExecForeignBulkInsert = postgresExecForeignBulkInsert;
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
@@ -1890,6 +1896,87 @@ postgresExecForeignDelete(EState *estate,
 								  slot, planSlot);
 }
 
+/*
+ * postgresExecForeignBulkInsert
+ *		Copy rows into a foreign table by COPY .. FROM STDIN machinery
+ */
+static void
+postgresExecForeignBulkInsert(ResultRelInfo *resultRelInfo,
+							  TupleTableSlot **slots,
+							  int nslots)
+{
+	Relation rel = resultRelInfo->ri_RelationDesc;
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	StringInfoData sql;
+	PGresult *res;
+	bool status = false;
+	PGconn *conn = fmstate->conn;
+	int i;
+
+	Assert(resultRelInfo->ri_FdwRoutine != NULL &&
+		   resultRelInfo->ri_FdwState != NULL);
+
+	fmstate->target_attrs = NULL;
+	fmstate->has_returning = false;
+	fmstate->retrieved_attrs = NULL;
+	fmstate->fdwcstate = BeginForeignCopyTo(rel);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+	fmstate->query = sql.data;
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		for (i = 0; i < nslots; i++)
+		{
+			char *buf = NextForeignCopyRow(fmstate->fdwcstate, slots[i]);
+
+			if (PQputCopyData(conn, buf, strlen(buf)) <= 0)
+			{
+				res = PQgetResult(conn);
+				pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+			}
+		}
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		EndForeignCopyTo(fmstate->fdwcstate);
+		pfree(fmstate->fdwcstate);
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
 /*
  * postgresEndForeignModify
  *		Finish an insert/update/delete operation on a foreign table
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..73f98a3152 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2293,6 +2293,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..ddf3c10146 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -133,6 +133,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -358,8 +359,11 @@ static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
 static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
 static void EndCopyTo(CopyState cstate);
+static void CopyToStart(CopyState cstate);
+static void CopyToFinish(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
 static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
@@ -587,7 +591,9 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			CopySendChar(cstate, '\0');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1075,7 +1081,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	else
 	{
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1815,6 +1821,32 @@ EndCopy(CopyState cstate)
 	pfree(cstate);
 }
 
+static char *buf = NULL;
+static void
+data_dest_cb(void *outbuf, int len)
+{
+	buf = (char *) palloc(len);
+	memcpy(buf, (char *) outbuf, len);
+}
+
+CopyState
+BeginForeignCopyTo(Relation rel)
+{
+	CopyState cstate;
+
+	cstate = BeginCopy(NULL, false, rel, NULL, InvalidOid, NIL, NIL);
+	cstate->copy_dest = COPY_CALLBACK;
+	cstate->data_dest_cb = data_dest_cb;
+	CopyToStart(cstate);
+	return cstate;
+}
+
+void
+EndForeignCopyTo(CopyState cstate)
+{
+	CopyToFinish(cstate);
+}
+
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
@@ -1825,6 +1857,7 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
@@ -1880,6 +1913,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -1950,6 +1988,13 @@ BeginCopyTo(ParseState *pstate,
 	return cstate;
 }
 
+char *
+NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot)
+{
+	CopyOneRowTo(cstate, slot);
+	return buf;
+}
+
 /*
  * This intermediate routine exists mainly to localize the effects of setjmp
  * so we don't need to plaster a lot of variables with "volatile".
@@ -1966,7 +2011,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2005,16 +2052,12 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
- */
-static uint64
-CopyTo(CopyState cstate)
+static void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -2104,6 +2147,29 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+static void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2135,17 +2201,6 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
@@ -2449,53 +2504,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert(resultRelInfo,
+															  slots,
+															  nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2868,14 +2934,14 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs || (resultRelInfo->ri_FdwRoutine != NULL &&
+			list_length(cstate->attnumlist) == 0))
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -3037,8 +3103,7 @@ CopyFrom(CopyState cstate)
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					!has_instead_insert_row_trig;
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -3048,7 +3113,8 @@ CopyFrom(CopyState cstate)
 													   resultRelInfo);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo) &&
+						 resultRelInfo->ri_FdwRoutine == NULL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..ef119a761a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -41,4 +42,8 @@ extern uint64 CopyFrom(CopyState cstate);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);
+
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..0507c2f96f 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -94,6 +94,9 @@ typedef TupleTableSlot *(*ExecForeignDelete_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
+typedef void (*ExecForeignBulkInsert_function) (ResultRelInfo *rinfo,
+												TupleTableSlot **slots,
+												int nslots);
 
 typedef void (*EndForeignModify_function) (EState *estate,
 										   ResultRelInfo *rinfo);
@@ -104,6 +107,10 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (ResultRelInfo *rinfo, bool status);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -211,6 +218,7 @@ typedef struct FdwRoutine
 	ExecForeignInsert_function ExecForeignInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
+	ExecForeignBulkInsert_function ExecForeignBulkInsert;
 	EndForeignModify_function EndForeignModify;
 	BeginForeignInsert_function BeginForeignInsert;
 	EndForeignInsert_function EndForeignInsert;
-- 
2.17.1

#8Ashutosh Bapat
ashutosh.bapat@2ndquadrant.com
In reply to: Andrey V. Lepikhov (#5)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Wed, 17 Jun 2020 at 11:54, Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

On 6/15/20 10:26 AM, Ashutosh Bapat wrote:

Thanks Andrey for the patch. I am glad that the patch has taken care
of some corner cases already but there exist still more.

COPY command constructed doesn't take care of dropped columns. There
is code in deparseAnalyzeSql which constructs list of columns for a
given foreign relation. 0002 patch attached here, moves that code to a
separate function and reuses it for COPY. If you find that code change
useful please include it in the main patch.

Thanks, i included it.

2. In the same case, if the foreign table declared locally didn't have
any non-dropped columns but the relation that it referred to on the
foreign server had some non-dropped columns, COPY command fails. I
added a test case for this in 0002 but haven't fixed it.

I fixed it.
This is very special corner case. The problem was that COPY FROM does
not support semantics like the "INSERT INTO .. DEFAULT VALUES". To
simplify the solution, i switched off bulk copying for this case.

I think this work is useful. Please add it to the next commitfest so
that it's tracked.

Ok.

It looks like we call BeginForeignInsert and EndForeignInsert even though
actual copy is performed using BeginForeignCopy, ExecForeignCopy
and EndForeignCopy. BeginForeignInsert constructs the INSERT query which
looks unnecessary. Also some of the other PgFdwModifyState members are
initialized unnecessarily. It also gives an impression that we are using
INSERT underneath the copy. Instead a better way would be to
call BeginForeignCopy instead of BeginForeignInsert and EndForeignCopy
instead of EndForeignInsert, if we are going to use COPY protocol to copy
data to the foreign server. Corresponding postgres_fdw implementations need
to change in order to do that.

This isn't a full review. I will continue reviewing this patch further.
--
Best Wishes,
Ashutosh

#9Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ashutosh Bapat (#8)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 6/22/20 5:11 PM, Ashutosh Bapat wrote:

<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
It looks like we call BeginForeignInsert and EndForeignInsert even
though actual copy is performed using BeginForeignCopy, ExecForeignCopy
and EndForeignCopy. BeginForeignInsert constructs the INSERT query which
looks unnecessary. Also some of the other PgFdwModifyState members are
initialized unnecessarily. It also gives an impression that we are using
INSERT underneath the copy. Instead a better way would be to
call BeginForeignCopy instead of BeginForeignInsert and EndForeignCopy
instead of EndForeignInsert, if we are going to use COPY protocol to
copy data to the foreign server. Corresponding postgres_fdw
implementations need to change in order to do that.

I did not answer for a long time, because of waiting for the results of
the discussion on Tomas approach to bulk INSERT/UPDATE/DELETE. It seems
more general.
I can move the query construction into the first execution of INSERT or
COPY operation. But another changes seems more invasive because
BeginForeignInsert/EndForeignInsert are used in the execPartition.c
module. We will need to pass copy/insert state of operation into
ExecFindPartition() and ExecCleanupTupleRouting().

--
regards,
Andrey Lepikhov
Postgres Professional

#10Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ashutosh Bapat (#8)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

22.06.2020 17:11, Ashutosh Bapat пишет:

On Wed, 17 Jun 2020 at 11:54, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

On 6/15/20 10:26 AM, Ashutosh Bapat wrote:

Thanks Andrey for the patch. I am glad that the patch has taken care
of some corner cases already but there exist still more.

COPY command constructed doesn't take care of dropped columns. There
is code in deparseAnalyzeSql which constructs list of columns for a
given foreign relation. 0002 patch attached here, moves that code

to a

separate function and reuses it for COPY. If you find that code

change

useful please include it in the main patch.

Thanks, i included it.

2. In the same case, if the foreign table declared locally didn't

have

any non-dropped columns but the relation that it referred to on the
foreign server had some non-dropped columns, COPY command fails. I
added a test case for this in 0002 but haven't fixed it.

I fixed it.
This is very special corner case. The problem was that COPY FROM does
not support semantics like the "INSERT INTO .. DEFAULT VALUES". To
simplify the solution, i switched off bulk copying for this case.

 > I think this work is useful. Please add it to the next commitfest so
 > that it's tracked.
Ok.

It looks like we call BeginForeignInsert and EndForeignInsert even
though actual copy is performed using BeginForeignCopy, ExecForeignCopy
and EndForeignCopy. BeginForeignInsert constructs the INSERT query which
looks unnecessary. Also some of the other PgFdwModifyState members are
initialized unnecessarily. It also gives an impression that we are using
INSERT underneath the copy. Instead a better way would be to
call BeginForeignCopy instead of BeginForeignInsert and EndForeignCopy
instead of EndForeignInsert, if we are going to use COPY protocol to
copy data to the foreign server. Corresponding postgres_fdw
implementations need to change in order to do that.

Fixed.
I replaced names of CopyIn FDW API. Also the partition routing
initializer calls BeginForeignInsert or BeginForeignCopyIn routines in
accordance with value of ResultRelInfo::UseBulkModifying.
I introduced this parameter because foreign partitions can be placed at
foreign servers with different types of foreign wrapper. Not all
wrappers can support CopyIn API.
Also I ran the Tomas Vondra benchmark. At my laptop we have results:
* regular: 5000 ms.
* Tomas buffering patch: 11000 ms.
* This CopyIn patch: 8000 ms.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v4-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v4-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From ac43384af911acd0a07b3fae0ab25a9131a4504c Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  33 ++-
 contrib/postgres_fdw/postgres_fdw.c           | 130 ++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  28 ++
 src/backend/commands/copy.c                   | 239 ++++++++++++------
 src/backend/executor/execMain.c               |   1 +
 src/backend/executor/execPartition.c          |  34 ++-
 src/include/commands/copy.h                   |   5 +
 src/include/foreign/fdwapi.h                  |  15 ++
 src/include/nodes/execnodes.h                 |   8 +
 11 files changed, 456 insertions(+), 98 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..3a3cca5047 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8183,6 +8184,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..0db8d74320 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,10 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopyIn(EState *estate, ResultRelInfo *resultRelInfo);
+static TupleTableSlot *postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo, TupleTableSlot **slots, int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +539,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopyIn = postgresBeginForeignCopyIn;
+	routine->EndForeignCopyIn = postgresEndForeignCopyIn;
+	routine->ExecForeignCopyIn = postgresExecForeignCopyIn;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -1847,6 +1856,9 @@ postgresExecForeignInsert(EState *estate,
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	TupleTableSlot *rslot;
 
+	Assert(!resultRelInfo->UseBulkModifying ||
+		   resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn == NULL);
+
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
 	 * postgresBeginForeignInsert())
@@ -2051,6 +2063,124 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+/*
+ *
+ * postgresBeginForeignCopyIn
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+	fmstate->cstate = BeginForeignCopyTo(resultRelInfo->ri_RelationDesc);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopyIn
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopyIn(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	EndForeignCopyTo(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopyIn
+ *		Send a number of tuples to the foreign relation.
+ */
+static TupleTableSlot *
+postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		for (i = 0; i < nslots; i++)
+		{
+			char *buf = NextForeignCopyRow(fmstate->cstate, slots[i]);
+
+			if (PQputCopyData(conn, buf, strlen(buf)) <= 0)
+			{
+				res = PQgetResult(conn);
+				pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+			}
+		}
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+	return NULL;
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..73f98a3152 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2293,6 +2293,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3e199bdfd0..7338c63fe5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -133,6 +133,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -358,8 +359,11 @@ static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
 static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
 static void EndCopyTo(CopyState cstate);
+static void CopyToStart(CopyState cstate);
+static void CopyToFinish(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
 static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
@@ -586,7 +590,9 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			CopySendChar(cstate, '\0');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1074,7 +1080,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	else
 	{
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1814,6 +1820,32 @@ EndCopy(CopyState cstate)
 	pfree(cstate);
 }
 
+static char *buf = NULL;
+static void
+data_dest_cb(void *outbuf, int len)
+{
+	buf = (char *) palloc(len);
+	memcpy(buf, (char *) outbuf, len);
+}
+
+CopyState
+BeginForeignCopyTo(Relation rel)
+{
+	CopyState cstate;
+
+	cstate = BeginCopy(NULL, false, rel, NULL, InvalidOid, NIL, NIL);
+	cstate->copy_dest = COPY_CALLBACK;
+	cstate->data_dest_cb = data_dest_cb;
+	CopyToStart(cstate);
+	return cstate;
+}
+
+void
+EndForeignCopyTo(CopyState cstate)
+{
+	CopyToFinish(cstate);
+}
+
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
@@ -1824,6 +1856,7 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
@@ -1879,6 +1912,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -1949,6 +1987,13 @@ BeginCopyTo(ParseState *pstate,
 	return cstate;
 }
 
+char *
+NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot)
+{
+	CopyOneRowTo(cstate, slot);
+	return buf;
+}
+
 /*
  * This intermediate routine exists mainly to localize the effects of setjmp
  * so we don't need to plaster a lot of variables with "volatile".
@@ -1965,7 +2010,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2004,16 +2051,12 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
- */
-static uint64
-CopyTo(CopyState cstate)
+static void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -2103,6 +2146,29 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+static void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2134,17 +2200,6 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
@@ -2444,53 +2499,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopyIn(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2800,11 +2866,6 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
-
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -2863,14 +2924,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -2910,6 +2970,24 @@ CopyFrom(CopyState cstate)
 								estate, mycid, ti_options);
 	}
 
+	if (insertMethod != CIM_SINGLE)
+		resultRelInfo->UseBulkModifying = true;
+
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->UseBulkModifying &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate,
+																resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
+	}
+
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
 	 * tuple slot too. When inserting into a partitioned table, we also need
@@ -3033,7 +3111,7 @@ CopyFrom(CopyState cstate)
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
 					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					(resultRelInfo->ri_FdwRoutine == NULL || resultRelInfo->UseBulkModifying);
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -3292,10 +3370,17 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->UseBulkModifying &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+		target_resultRelInfo->UseBulkModifying = false;
+	}
 
 	/* Tear down the multi-insert buffer data */
 	if (insertMethod != CIM_SINGLE)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..d3e8f1c720 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1345,6 +1345,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+	resultRelInfo->UseBulkModifying = false;
 }
 
 /*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..c216296811 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -526,6 +526,11 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	if (rootResultRelInfo->UseBulkModifying &&
+		leaf_part_rri->ri_FdwRoutine != NULL &&
+		leaf_part_rri->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+		leaf_part_rri->UseBulkModifying = true;
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
@@ -937,9 +942,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->UseBulkModifying)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1121,10 +1133,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->UseBulkModifying)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..ef119a761a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -41,4 +42,8 @@ extern uint64 CopyFrom(CopyState cstate);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);
+
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..073eeee2ca 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopyIn_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopyIn_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef TupleTableSlot *(*ExecForeignCopyIn_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopyIn_function BeginForeignCopyIn;
+	EndForeignCopyIn_function EndForeignCopyIn;
+	ExecForeignCopyIn_function ExecForeignCopyIn;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0187989fd1..8ac366a659 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -491,6 +491,14 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * For use by copy.c:
+	 * for partitioned relation "true" means that child relations are allowed for
+	 * using bulk modify operations; for foreign relation (or foreign partition
+	 * of) "true" value means that modify operations must use bulk FDW API.
+	 */
+	bool UseBulkModifying;
 } ResultRelInfo;
 
 /* ----------------
-- 
2.17.1

#11Amit Langote
amitlangote09@gmail.com
In reply to: Andrey Lepikhov (#10)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

Thanks for this work. I have been reading through your patch and
here's a what I understand it does and how:

The patch aims to fix the restriction that COPYing into a foreign
table can't use multi-insert buffer mechanism effectively. That's
because copy.c currently uses the ExecForeignInsert() FDW API which
can be passed only 1 row at a time. postgres_fdw's implementation
issues an `INSERT INTO remote_table VALUES (...)` statement to the
remote side for each row, which is pretty inefficient for bulk loads.
The patch introduces a new FDW API ExecForeignCopyIn() that can
receive multiple rows and copy.c now calls it every time it flushes
the multi-insert buffer so that all the flushed rows can be sent to
the remote side in one go. postgres_fdw's now issues a `COPY
remote_table FROM STDIN` to the remote server and
postgresExecForeignCopyIn() funnels the tuples flushed by the local
copy to the server side waiting for tuples on the COPY protocol.

Here are some comments on the patch.

* Why the "In" in these API names?

+   /* COPY a bulk of tuples into a foreign relation */
+   BeginForeignCopyIn_function BeginForeignCopyIn;
+   EndForeignCopyIn_function EndForeignCopyIn;
+   ExecForeignCopyIn_function ExecForeignCopyIn;

* fdwhandler.sgml should be updated with the description of these new APIs.

* As far as I can tell, the following copy.h additions are for an FDW
to use copy.c to obtain an external representation (char string) to
send to the remote side of the individual rows that are passed to
ExecForeignCopyIn():

+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);

So, an FDW's ExecForeignCopyIn() calls copy.c: NextForeignCopyRow()
which in turn calls copy.c: CopyOneRowTo() which fills
CopyState.fe_msgbuf. The data_dest_cb() callback that runs after
fe_msgbuf contains the full row simply copies it into a palloc'd char
buffer whose pointer is returned back to ExecForeignCopyIn(). I
wonder why not let FDWs implement the callback and pass it to copy.c
through BeginForeignCopyTo()? For example, you could implement a
pgfdw_copy_data_dest_cb() in postgres_fdw.c which gets a direct
pointer of fe_msgbuf to send it to the remote server.

Do you think all FDWs would want to use copy,c like above? If not,
maybe the above APIs are really postgres_fdw-specific? Anyway, adding
comments above the definitions of these functions would be helpful.

* I see that the remote copy is performed from scratch on every call
of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
send the `COPY remote_table FROM STDIN` in
postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
when there are no errors during the copy?

I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
and sending `COPY remote_table FROM STDIN` only once instead of on
every flush -- and I see significant speedup. Please check the
attached patch that applies on top of yours. One problem I spotted
when trying my patch but didn't spend much time debugging is that
local COPY cannot be interrupted by Ctrl+C anymore, but that should be
fixable by adjusting PG_TRY() blocks.

* ResultRelInfo.UseBulkModifying should be ri_usesBulkModify for consistency.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pgfdw-copy-buffering-amit-suggests.patchapplication/x-patch; name=pgfdw-copy-buffering-amit-suggests.patchDownload
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0db8d74..5668977 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2063,6 +2063,21 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
 /*
  *
  * postgresBeginForeignCopyIn
@@ -2076,6 +2091,8 @@ postgresBeginForeignCopyIn(ModifyTableState *mtstate,
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	StringInfoData sql;
 	RangeTblEntry *rte;
+	PGconn *conn;
+	PGresult *res;
 
 	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
 	initStringInfo(&sql);
@@ -2090,8 +2107,16 @@ postgresBeginForeignCopyIn(ModifyTableState *mtstate,
 									NIL,
 									false,
 									NIL);
-	fmstate->cstate = BeginForeignCopyTo(resultRelInfo->ri_RelationDesc);
+	fmstate->cstate = BeginForeignCopyTo(rel, pgfdw_copy_dest_cb);
 	resultRelInfo->ri_FdwState = fmstate;
+
+	conn = fmstate->conn;
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	copy_fmstate = fmstate;
 }
 
 /*
@@ -2102,14 +2127,40 @@ static void
 postgresEndForeignCopyIn(EState *estate, ResultRelInfo *resultRelInfo)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	PGconn *conn = fmstate->conn;
+	PGresult *res;
 
 	/* Check correct use of CopyIn FDW API. */
 	Assert(fmstate->cstate != NULL);
 
+	/*
+	 * Finish COPY IN protocol. It is needed to do after successful copy or
+	 * after an error.
+	 */
+	if (PQputCopyEnd(conn, NULL) <= 0 ||
+		PQflush(conn))
+		ereport(ERROR,
+				(errmsg("error returned by PQputCopyEnd: %s",
+						PQerrorMessage(conn))));
+
+	/* After successfully  sending an EOF signal, check command status. */
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+	PQclear(res);
+
+	/* Do this to ensure we've pumped libpq back to idle state */
+	if (PQgetResult(conn) != NULL)
+		ereport(ERROR,
+				(errmsg("unexpected extra results during COPY of table: %s",
+						PQerrorMessage(conn))));
+
 	EndForeignCopyTo(fmstate->cstate);
 	pfree(fmstate->cstate);
 	fmstate->cstate = NULL;
 	finish_foreign_modify(fmstate);
+
+	copy_fmstate = NULL;
 }
 
 /*
@@ -2122,58 +2173,21 @@ postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
 						  TupleTableSlot **slots, int nslots)
 {
 	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
-	PGresult *res;
-	PGconn *conn = fmstate->conn;
 	bool status = false;
 	int i;
 
 	/* Check correct use of CopyIn FDW API. */
 	Assert(fmstate->cstate != NULL);
 
-	res = PQexec(conn, fmstate->query);
-	if (PQresultStatus(res) != PGRES_COPY_IN)
-		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
-	PQclear(res);
-
 	PG_TRY();
 	{
 		for (i = 0; i < nslots; i++)
-		{
-			char *buf = NextForeignCopyRow(fmstate->cstate, slots[i]);
-
-			if (PQputCopyData(conn, buf, strlen(buf)) <= 0)
-			{
-				res = PQgetResult(conn);
-				pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
-			}
-		}
+			NextForeignCopyRow(fmstate->cstate, slots[i]);
 
 		status = true;
 	}
 	PG_FINALLY();
 	{
-		/* Finish COPY IN protocol. It is needed to do after successful copy or
-		 * after an error.
-		 */
-		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
-			PQflush(conn))
-			ereport(ERROR,
-					(errmsg("error returned by PQputCopyEnd: %s",
-							PQerrorMessage(conn))));
-
-		/* After successfully  sending an EOF signal, check command status. */
-		res = PQgetResult(conn);
-		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
-			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
-
-		PQclear(res);
-		/* Do this to ensure we've pumped libpq back to idle state */
-		if (PQgetResult(conn) != NULL)
-			ereport(ERROR,
-					(errmsg("unexpected extra results during COPY of table: %s",
-							PQerrorMessage(conn))));
-
 		if (!status)
 			PG_RE_THROW();
 	}
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f5f1d40..51b7233 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -594,7 +594,6 @@ CopySendEndOfRow(CopyState cstate)
 			break;
 		case COPY_CALLBACK:
 			CopySendChar(cstate, '\n');
-			CopySendChar(cstate, '\0');
 			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
@@ -1823,16 +1822,8 @@ EndCopy(CopyState cstate)
 	pfree(cstate);
 }
 
-static char *buf = NULL;
-static void
-data_dest_cb(void *outbuf, int len)
-{
-	buf = (char *) palloc(len);
-	memcpy(buf, (char *) outbuf, len);
-}
-
 CopyState
-BeginForeignCopyTo(Relation rel)
+BeginForeignCopyTo(Relation rel, copy_data_dest_cb data_dest_cb)
 {
 	CopyState cstate;
 
@@ -1990,11 +1981,10 @@ BeginCopyTo(ParseState *pstate,
 	return cstate;
 }
 
-char *
+void
 NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot)
 {
 	CopyOneRowTo(cstate, slot);
-	return buf;
 }
 
 /*
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ef119a7..f31ed13 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,8 +42,8 @@ extern uint64 CopyFrom(CopyState cstate);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
-extern CopyState BeginForeignCopyTo(Relation rel);
-extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern CopyState BeginForeignCopyTo(Relation rel, copy_data_dest_cb data_dest_cb);
+extern void NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
 extern void EndForeignCopyTo(CopyState cstate);
 
 #endif							/* COPY_H */
#12Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#11)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 7/16/20 2:14 PM, Amit Langote wrote:

Hi Andrey,

Thanks for this work. I have been reading through your patch and
here's a what I understand it does and how:

The patch aims to fix the restriction that COPYing into a foreign
table can't use multi-insert buffer mechanism effectively. That's
because copy.c currently uses the ExecForeignInsert() FDW API which
can be passed only 1 row at a time. postgres_fdw's implementation
issues an `INSERT INTO remote_table VALUES (...)` statement to the
remote side for each row, which is pretty inefficient for bulk loads.
The patch introduces a new FDW API ExecForeignCopyIn() that can
receive multiple rows and copy.c now calls it every time it flushes
the multi-insert buffer so that all the flushed rows can be sent to
the remote side in one go. postgres_fdw's now issues a `COPY
remote_table FROM STDIN` to the remote server and
postgresExecForeignCopyIn() funnels the tuples flushed by the local
copy to the server side waiting for tuples on the COPY protocol.

Fine

Here are some comments on the patch.

* Why the "In" in these API names?

+   /* COPY a bulk of tuples into a foreign relation */
+   BeginForeignCopyIn_function BeginForeignCopyIn;
+   EndForeignCopyIn_function EndForeignCopyIn;
+   ExecForeignCopyIn_function ExecForeignCopyIn;

I used an analogy from copy.c.

* fdwhandler.sgml should be updated with the description of these new APIs.

* As far as I can tell, the following copy.h additions are for an FDW
to use copy.c to obtain an external representation (char string) to
send to the remote side of the individual rows that are passed to
ExecForeignCopyIn():

+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);

So, an FDW's ExecForeignCopyIn() calls copy.c: NextForeignCopyRow()
which in turn calls copy.c: CopyOneRowTo() which fills
CopyState.fe_msgbuf. The data_dest_cb() callback that runs after
fe_msgbuf contains the full row simply copies it into a palloc'd char
buffer whose pointer is returned back to ExecForeignCopyIn(). I
wonder why not let FDWs implement the callback and pass it to copy.c
through BeginForeignCopyTo()? For example, you could implement a
pgfdw_copy_data_dest_cb() in postgres_fdw.c which gets a direct
pointer of fe_msgbuf to send it to the remote server.

It is good point! Thank you.

Do you think all FDWs would want to use copy,c like above? If not,
maybe the above APIs are really postgres_fdw-specific? Anyway, adding
comments above the definitions of these functions would be helpful.

Agreed

* I see that the remote copy is performed from scratch on every call
of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
send the `COPY remote_table FROM STDIN` in
postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
when there are no errors during the copy?

It is not possible. FDW share one connection between all foreign
relations from a server. If two or more partitions will be placed at one
foreign server you will have problems with concurrent COPY command. May
be we can create new connection for each partition?

I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
and sending `COPY remote_table FROM STDIN` only once instead of on
every flush -- and I see significant speedup. Please check the
attached patch that applies on top of yours.

I integrated first change and rejected the second by the reason as above.
One problem I spotted

when trying my patch but didn't spend much time debugging is that
local COPY cannot be interrupted by Ctrl+C anymore, but that should be
fixable by adjusting PG_TRY() blocks.

Thanks

* ResultRelInfo.UseBulkModifying should be ri_usesBulkModify for consistency.

+1

I will post a new version of the patch a little bit later.

--
regards,
Andrey Lepikhov
Postgres Professional

#13Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#11)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 7/16/20 2:14 PM, Amit Langote wrote:

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Version 5 of the patch. With changes caused by Amit's comments.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v5-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v5-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 24465d61d6f0ec6a45578d252bda1690ac045543 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopyIn
* EndForeignCopyIn
* ExecForeignCopyIn

BeginForeignCopyIn and EndForeignCopyIn initialize and free
the CopyState of bulk COPY. The ExecForeignCopyIn routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  33 ++-
 contrib/postgres_fdw/postgres_fdw.c           | 146 +++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  28 ++
 doc/src/sgml/fdwhandler.sgml                  |  74 ++++++
 src/backend/commands/copy.c                   | 247 +++++++++++-------
 src/backend/executor/execMain.c               |   1 +
 src/backend/executor/execPartition.c          |  34 ++-
 src/include/commands/copy.h                   |  11 +
 src/include/foreign/fdwapi.h                  |  15 ++
 src/include/nodes/execnodes.h                 |   8 +
 12 files changed, 547 insertions(+), 111 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..baadb4ea80 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8183,6 +8184,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..a314821fb0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopyIn(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +542,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopyIn = postgresBeginForeignCopyIn;
+	routine->EndForeignCopyIn = postgresEndForeignCopyIn;
+	routine->ExecForeignCopyIn = postgresExecForeignCopyIn;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -1847,6 +1859,9 @@ postgresExecForeignInsert(EState *estate,
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	TupleTableSlot *rslot;
 
+	Assert(!resultRelInfo->ri_usesBulkModify ||
+		   resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn == NULL);
+
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
 	 * postgresBeginForeignInsert())
@@ -2051,6 +2066,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
+/*
+ *
+ * postgresBeginForeignCopyIn
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopyIn
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopyIn(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopyIn
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..73f98a3152 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2293,6 +2293,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 74793035d7..e8fd91a7bc 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -795,6 +795,80 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopyIn(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopyIn</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopyIn</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+void
+EndForeignCopyIn(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *
+ExecForeignCopyIn(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c4cb..2d184b2eee 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -128,11 +128,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -355,17 +358,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -589,7 +587,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1076,8 +1075,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1459,6 +1458,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1494,6 +1494,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1820,20 +1825,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1872,8 +1882,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1882,6 +1893,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -1968,7 +1984,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -1991,7 +2009,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2007,19 +2025,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2106,6 +2127,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2137,24 +2184,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2447,53 +2483,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopyIn(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2806,11 +2853,6 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
-
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -2869,14 +2911,13 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -2916,6 +2957,24 @@ CopyFrom(CopyState cstate)
 								estate, mycid, ti_options);
 	}
 
+	if (insertMethod != CIM_SINGLE)
+		resultRelInfo->ri_usesBulkModify = true;
+
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesBulkModify &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate,
+																resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
+	}
+
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
 	 * tuple slot too. When inserting into a partitioned table, we also need
@@ -3039,7 +3098,7 @@ CopyFrom(CopyState cstate)
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
 					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					(resultRelInfo->ri_FdwRoutine == NULL || resultRelInfo->ri_usesBulkModify);
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -3298,10 +3357,17 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesBulkModify &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+		target_resultRelInfo->ri_usesBulkModify = false;
+	}
 
 	/* Tear down the multi-insert buffer data */
 	if (insertMethod != CIM_SINGLE)
@@ -3354,7 +3420,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..b8b09d528e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1345,6 +1345,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+	resultRelInfo->ri_usesBulkModify = false;
 }
 
 /*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49056..1344434cf0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -526,6 +526,11 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	if (rootResultRelInfo->ri_usesBulkModify &&
+		leaf_part_rri->ri_FdwRoutine != NULL &&
+		leaf_part_rri->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+		leaf_part_rri->ri_usesBulkModify = true;
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
@@ -937,9 +942,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesBulkModify)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1121,10 +1133,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesBulkModify)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..11ea451fe4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopyIn_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopyIn_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopyIn_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopyIn_function BeginForeignCopyIn;
+	EndForeignCopyIn_function EndForeignCopyIn;
+	ExecForeignCopyIn_function ExecForeignCopyIn;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6f96b31fb4..de326035da 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -491,6 +491,14 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * For use by copy.c:
+	 * for partitioned relation "true" means that child relations are allowed for
+	 * using bulk modify operations; for foreign relation (or foreign partition
+	 * of) "true" value means that modify operations must use bulk FDW API.
+	 */
+	bool ri_usesBulkModify;
 } ResultRelInfo;
 
 /* ----------------
-- 
2.25.1

#14Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Andrey V. Lepikhov (#13)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

On 2020-07-23 09:23, Andrey V. Lepikhov wrote:

On 7/16/20 2:14 PM, Amit Langote wrote:

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Version 5 of the patch. With changes caused by Amit's comments.

Just got a segfault with your v5 patch by deleting from a foreign table.
Here is a part of backtrace:

* frame #0: 0x00000001029069ec
postgres`ExecShutdownForeignScan(node=0x00007ff28c8909b0) at
nodeForeignscan.c:385:3
frame #1: 0x00000001028e7b06
postgres`ExecShutdownNode(node=0x00007ff28c8909b0) at
execProcnode.c:779:4
frame #2: 0x000000010299b3fa
postgres`planstate_walk_members(planstates=0x00007ff28c8906d8, nplans=1,
walker=(postgres`ExecShutdownNode at execProcnode.c:752),
context=0x0000000000000000) at nodeFuncs.c:3998:7
frame #3: 0x000000010299b010
postgres`planstate_tree_walker(planstate=0x00007ff28c8904c0,
walker=(postgres`ExecShutdownNode at execProcnode.c:752),
context=0x0000000000000000) at nodeFuncs.c:3914:8
frame #4: 0x00000001028e7ab7
postgres`ExecShutdownNode(node=0x00007ff28c8904c0) at
execProcnode.c:771:2

(lldb) f 0
frame #0: 0x00000001029069ec
postgres`ExecShutdownForeignScan(node=0x00007ff28c8909b0) at
nodeForeignscan.c:385:3
382 FdwRoutine *fdwroutine = node->fdwroutine;
383
384 if (fdwroutine->ShutdownForeignScan)
-> 385 fdwroutine->ShutdownForeignScan(node);
386 }
(lldb) p node->fdwroutine->ShutdownForeignScan
(ShutdownForeignScan_function) $1 = 0x7f7f7f7f7f7f7f7f

It seems that ShutdownForeignScan inside node->fdwroutine doesn't have a
correct pointer to the required function.

I haven't had a chance to look closer on the code, but you can easily
reproduce this error with the attached script (patched Postgres binaries
should be available in the PATH). It works well with master and fails
with your patch applied.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachments:

setup2.shtext/plain; name=setup2.shDownload
#15Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Alexey Kondratov (#14)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

27.07.2020 21:34, Alexey Kondratov пишет:

Hi Andrey,

On 2020-07-23 09:23, Andrey V. Lepikhov wrote:

On 7/16/20 2:14 PM, Amit Langote wrote:

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Version 5 of the patch. With changes caused by Amit's comments.

Just got a segfault with your v5 patch by deleting from a foreign table.
It seems that ShutdownForeignScan inside node->fdwroutine doesn't have a
correct pointer to the required function.

I haven't had a chance to look closer on the code, but you can easily
reproduce this error with the attached script (patched Postgres binaries
should be available in the PATH). It works well with master and fails
with your patch applied.

I used master a3ab7a707d and v5 version of the patch with your script.
No errors found. Can you check your test case?

--
regards,
Andrey Lepikhov
Postgres Professional

#16Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Andrey Lepikhov (#15)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 2020-07-28 03:33, Andrey Lepikhov wrote:

27.07.2020 21:34, Alexey Kondratov пишет:

Hi Andrey,

On 2020-07-23 09:23, Andrey V. Lepikhov wrote:

On 7/16/20 2:14 PM, Amit Langote wrote:

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Version 5 of the patch. With changes caused by Amit's comments.

Just got a segfault with your v5 patch by deleting from a foreign
table. It seems that ShutdownForeignScan inside node->fdwroutine
doesn't have a correct pointer to the required function.

I haven't had a chance to look closer on the code, but you can easily
reproduce this error with the attached script (patched Postgres
binaries should be available in the PATH). It works well with master
and fails with your patch applied.

I used master a3ab7a707d and v5 version of the patch with your script.
No errors found. Can you check your test case?

Yes, my bad. I forgot to re-install postgres_fdw extension, only did it
for postgres core, sorry for disturb.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#17Amit Langote
amitlangote09@gmail.com
In reply to: Andrey V. Lepikhov (#12)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

Thanks for updating the patch. I will try to take a look later.

On Wed, Jul 22, 2020 at 6:09 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 7/16/20 2:14 PM, Amit Langote wrote:

* Why the "In" in these API names?

+   /* COPY a bulk of tuples into a foreign relation */
+   BeginForeignCopyIn_function BeginForeignCopyIn;
+   EndForeignCopyIn_function EndForeignCopyIn;
+   ExecForeignCopyIn_function ExecForeignCopyIn;

I used an analogy from copy.c.

Hmm, if we were going to also need *ForeignCopyOut APIs, maybe it
makes sense to have "In" here, but maybe we don't, so how about
leaving out the "In" for clarity?

* I see that the remote copy is performed from scratch on every call
of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
send the `COPY remote_table FROM STDIN` in
postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
when there are no errors during the copy?

It is not possible. FDW share one connection between all foreign
relations from a server. If two or more partitions will be placed at one
foreign server you will have problems with concurrent COPY command.

Ah, you're right. I didn't consider multiple foreign partitions
pointing to the same server. Indeed, we would need separate
connections to a given server to COPY to multiple remote relations on
that server in parallel.

May be we can create new connection for each partition?

Yeah, perhaps, although it sounds like something that might be more
generally useful and so we should work on that separately if at all.

I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
and sending `COPY remote_table FROM STDIN` only once instead of on
every flush -- and I see significant speedup. Please check the
attached patch that applies on top of yours.

I integrated first change and rejected the second by the reason as above.

Thanks.

Will send more comments after reading the v5 patch.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#18Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#17)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 7/29/20 1:03 PM, Amit Langote wrote:

Hi Andrey,

Thanks for updating the patch. I will try to take a look later.

On Wed, Jul 22, 2020 at 6:09 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 7/16/20 2:14 PM, Amit Langote wrote:

* Why the "In" in these API names?

+   /* COPY a bulk of tuples into a foreign relation */
+   BeginForeignCopyIn_function BeginForeignCopyIn;
+   EndForeignCopyIn_function EndForeignCopyIn;
+   ExecForeignCopyIn_function ExecForeignCopyIn;

I used an analogy from copy.c.

Hmm, if we were going to also need *ForeignCopyOut APIs, maybe it
makes sense to have "In" here, but maybe we don't, so how about
leaving out the "In" for clarity?

Ok, sounds good.

* I see that the remote copy is performed from scratch on every call
of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
send the `COPY remote_table FROM STDIN` in
postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
when there are no errors during the copy?

It is not possible. FDW share one connection between all foreign
relations from a server. If two or more partitions will be placed at one
foreign server you will have problems with concurrent COPY command.

Ah, you're right. I didn't consider multiple foreign partitions
pointing to the same server. Indeed, we would need separate
connections to a given server to COPY to multiple remote relations on
that server in parallel.

May be we can create new connection for each partition?

Yeah, perhaps, although it sounds like something that might be more
generally useful and so we should work on that separately if at all.

I will try to prepare a separate patch.

I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
and sending `COPY remote_table FROM STDIN` only once instead of on
every flush -- and I see significant speedup. Please check the
attached patch that applies on top of yours.

I integrated first change and rejected the second by the reason as above.

Thanks.

Will send more comments after reading the v5 patch.

Ok. I'll be waiting for the end of your review.

--
regards,
Andrey Lepikhov
Postgres Professional

#19Amit Langote
amitlangote09@gmail.com
In reply to: Andrey V. Lepikhov (#18)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

On Wed, Jul 29, 2020 at 5:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Will send more comments after reading the v5 patch.

Ok. I'll be waiting for the end of your review.

Sorry about the late reply.

If you'd like to send a new version for other reviewers, please feel
free. I haven't managed to take more than a brief look at the v5
patch, but will try to look at it (or maybe the new version if you
post) more closely this week.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#20Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#19)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Mon, Aug 3, 2020 at 8:38 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jul 29, 2020 at 5:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Will send more comments after reading the v5 patch.

Ok. I'll be waiting for the end of your review.

Sorry about the late reply.

If you'd like to send a new version for other reviewers, please feel
free. I haven't managed to take more than a brief look at the v5
patch, but will try to look at it (or maybe the new version if you
post) more closely this week.

I was playing around with v5 and I noticed an assertion failure which
I concluded is due to improper setting of ri_usesBulkModify. You can
reproduce it with these steps.

create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table foo (a int, b int) partition by list (a);
create table foo1 (like foo);
create foreign table ffoo1 partition of foo for values in (1) server
lb options (table_name 'foo1');
create table foo2 (like foo);
create foreign table ffoo2 partition of foo for values in (2) server
lb options (table_name 'foo2');
create function print_new_row() returns trigger language plpgsql as $$
begin raise notice '%', new; return new; end; $$;
create trigger ffoo1_br_trig before insert on ffoo1 for each row
execute function print_new_row();
copy foo from stdin csv;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

1,2
2,3
\.

NOTICE: (1,2)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

#0 0x00007f2d5e266337 in raise () from /lib64/libc.so.6
#1 0x00007f2d5e267a28 in abort () from /lib64/libc.so.6
#2 0x0000000000aafd5d in ExceptionalCondition
(conditionName=0x7f2d37b468d0 "!resultRelInfo->ri_usesBulkModify ||
resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn == NULL",
errorType=0x7f2d37b46680 "FailedAssertion",
fileName=0x7f2d37b4677f "postgres_fdw.c", lineNumber=1863) at
assert.c:67
#3 0x00007f2d37b3b0fe in postgresExecForeignInsert (estate=0x2456320,
resultRelInfo=0x23a8f58, slot=0x23a9480, planSlot=0x0) at
postgres_fdw.c:1862
#4 0x000000000066362a in CopyFrom (cstate=0x23a8d40) at copy.c:3331

The problem is that partition ffoo1's BR trigger prevents it from
using multi-insert, but its ResultRelInfo.ri_usesBulkModify is true,
which is copied from its parent. We should really check the same
things for a partition that CopyFrom() checks for the main target
relation (root parent) when deciding whether to use multi-insert.

However instead of duplicating the same logic to do so in two places
(CopyFrom and ExecInitPartitionInfo), I think it might be a good idea
to refactor the code to decide if multi-insert mode can be used for a
given relation by checking its properties and put it in some place
that both the main target relation and partitions need to invoke.
InitResultRelInfo() seems to be one such place.

Also, it might be a good idea to use ri_usesBulkModify more generally
than only for foreign relations as the patch currently does, because I
can see that it can replace the variable insertMethod in CopyFrom().
Having both insertMethod and ri_usesBulkModify in each ResultRelInfo
seems confusing and bug-prone.

Finally, I suggest renaming ri_usesBulkModify to ri_usesMultiInsert to
reflect its scope.

Please check the attached delta patch that applies on top of v5 to see
what that would look like.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

ri_usesMultiInsert.patchapplication/octet-stream; name=ri_usesMultiInsert.patchDownload
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a314821..19cf119 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1859,9 +1859,6 @@ postgresExecForeignInsert(EState *estate,
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	TupleTableSlot *rslot;
 
-	Assert(!resultRelInfo->ri_usesBulkModify ||
-		   resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn == NULL);
-
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
 	 * postgresBeginForeignInsert())
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fd1f1d6..43cd8c0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -86,16 +86,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
  * even though some fields are used in only some cases.
@@ -2762,12 +2752,11 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
+	bool		use_multi_insert;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2868,6 +2857,52 @@ CopyFrom(CopyState cstate)
 	}
 
 	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately
+	 * for every tuple. However, there are a number of reasons why we might
+	 * not be able to do this.  We check some conditions below while some
+	 * other target relation properties are left for InitResultRelInfo() to
+	 * check, because they must also be checked for partitions which are
+	 * initialized later.
+	 */
+	if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
+	{
+		/*
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
+		 *
+		 * Note: It does not matter if any partitions have any volatile
+		 * default expressions as we use the defaults from the target of the
+		 * COPY command.
+		 */
+		use_multi_insert = false;
+	}
+	else if (contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile function
+		 * expressions in WHERE clause.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
+		 */
+		use_multi_insert = false;
+	}
+	else
+	{
+		/*
+		 * Looks okay to try multi-insert, but that may change once we
+		 * check few more properties in InitResultRelInfo().
+		 *
+		 * For partitioned tables, whether or not to use multi-insert depends
+		 * on the individual parition's properties which are also checked in
+		 * InitResultRelInfo().
+		 */
+		use_multi_insert = true;
+	}
+
+	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
 	 * here that basically duplicated execUtils.c ...)
@@ -2877,6 +2912,7 @@ CopyFrom(CopyState cstate)
 					  cstate->rel,
 					  1,		/* must match rel's position in range_table */
 					  NULL,
+					  use_multi_insert,
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
@@ -2928,85 +2964,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
-	{
-		/*
-		 * Can't support bufferization of copy into foreign tables without any
-		 * defined columns or if there are any volatile default expressions in the
-		 * table. Similarly to the trigger case above, such expressions may query
-		 * the table we're inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
-
-	if (insertMethod != CIM_SINGLE)
-		resultRelInfo->ri_usesBulkModify = true;
 
 	/*
 	 * Init COPY into foreign table. Initialization of copying into foreign
@@ -3014,7 +2974,7 @@ CopyFrom(CopyState cstate)
 	 */
 	if (target_resultRelInfo->ri_FdwRoutine != NULL)
 	{
-		if (target_resultRelInfo->ri_usesBulkModify &&
+		if (target_resultRelInfo->ri_usesMultiInsert &&
 			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
 			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate,
 																resultRelInfo);
@@ -3029,7 +2989,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3072,7 +3032,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3080,7 +3040,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3139,24 +3098,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					(resultRelInfo->ri_FdwRoutine == NULL || resultRelInfo->ri_usesBulkModify);
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3208,7 +3157,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3227,9 +3176,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3300,7 +3246,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3375,11 +3321,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3407,19 +3350,17 @@ CopyFrom(CopyState cstate)
 	/* Allow the FDW to shut down */
 	if (target_resultRelInfo->ri_FdwRoutine != NULL)
 	{
-		if (target_resultRelInfo->ri_usesBulkModify &&
+		if (target_resultRelInfo->ri_usesMultiInsert &&
 			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL)
 			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(estate,
 														target_resultRelInfo);
 		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
 			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
 														target_resultRelInfo);
-		target_resultRelInfo->ri_usesBulkModify = false;
 	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ac53f79..f071b56 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1786,6 +1786,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 						  rel,
 						  0,	/* dummy rangetable index */
 						  NULL,
+						  false,
 						  0);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b8b09d5..e08c8b2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 							  resultRelation,
 							  resultRelationIndex,
 							  NULL,
+							  false,
 							  estate->es_instrument);
 			resultRelInfo++;
 		}
@@ -883,6 +884,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 								  resultRelDesc,
 								  resultRelIndex,
 								  NULL,
+								  false,
 								  estate->es_instrument);
 				resultRelInfo++;
 			}
@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 				  Relation resultRelationDesc,
 				  Index resultRelationIndex,
 				  Relation partition_root,
+				  bool use_multi_insert,
 				  int instrument_options)
 {
 	List	   *partition_check = NIL;
@@ -1345,7 +1348,55 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
-	resultRelInfo->ri_usesBulkModify = false;
+
+	/*
+	 * If the caller has asked to use "multi-insert" mode, check if the
+	 * relation allows it and if it does set ri_usesMultiInsert to true.
+	 */
+	if (!use_multi_insert)
+	{
+		/* Caller didn't ask for it. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
+	{
+		/*
+		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+		 * triggers on the table. Such triggers might query the table we're
+		 * inserting into and act differently if the tuples that have already
+		 * been processed and prepared for insertion are not there.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+			 resultRelInfo->ri_TrigDesc != NULL &&
+			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
+	{
+		/*
+		 * For partitioned tables we can't support multi-inserts when there
+		 * are any statement level insert triggers. It might be possible to
+		 * allow partitioned tables with such triggers in the future, but for
+		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * and statement level insert triggers are on the same relation.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_FdwRoutine->ExecForeignCopyIn == NULL)
+	{
+		/*
+		 * For a foreign table, we can't support multi-inserts unless its FDW
+		 * provides the necessary COPY interface.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else
+	{
+		/* OK, caller can use multi-insert on this relation. */
+		resultRelInfo->ri_usesMultiInsert = true;
+	}
 }
 
 /*
@@ -1435,6 +1486,7 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 					  rel,
 					  0,		/* dummy rangetable index */
 					  NULL,
+					  false,
 					  estate->es_instrument);
 	estate->es_trig_target_relations =
 		lappend(estate->es_trig_target_relations, rInfo);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 6a4b947..7b72a09 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -524,13 +524,9 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  partrel,
 					  node ? node->rootRelation : 1,
 					  rootrel,
+					  rootResultRelInfo->ri_usesMultiInsert,
 					  estate->es_instrument);
 
-	if (rootResultRelInfo->ri_usesBulkModify &&
-		leaf_part_rri->ri_FdwRoutine != NULL &&
-		leaf_part_rri->ri_FdwRoutine->BeginForeignCopyIn != NULL)
-		leaf_part_rri->ri_usesBulkModify = true;
-
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
@@ -944,11 +940,9 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL)
 	{
-		if (partRelInfo->ri_usesBulkModify)
-		{
-			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL);
+		if (partRelInfo->ri_usesMultiInsert &&
+			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
 			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate, partRelInfo);
-		}
 		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
 	}
@@ -1135,7 +1129,7 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		/* Allow any FDWs to shut down */
 		if (resultRelInfo->ri_FdwRoutine != NULL)
 		{
-			if (resultRelInfo->ri_usesBulkModify)
+			if (resultRelInfo->ri_usesMultiInsert)
 			{
 				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL);
 				resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(mtstate->ps.state,
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 2fcf2e6..d33c9c9 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -211,7 +211,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, false, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117..72612bd 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -189,6 +189,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Relation resultRelationDesc,
 							  Index resultRelationIndex,
 							  Relation partition_root,
+							  bool use_multi_insert,
 							  int instrument_options);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb3b2f0..1d79b6a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,16 +489,15 @@ typedef struct ResultRelInfo
 	/* Additional information specific to partition tuple routing */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* For use by copy.c when performing multi-inserts */
-	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
-
 	/*
-	 * For use by copy.c:
-	 * for partitioned relation "true" means that child relations are allowed for
-	 * using bulk modify operations; for foreign relation (or foreign partition
-	 * of) "true" value means that modify operations must use bulk FDW API.
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
 	 */
-	bool ri_usesBulkModify;
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
+	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
 /* ----------------
#21Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#20)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 8/7/20 2:14 PM, Amit Langote wrote:

I was playing around with v5 and I noticed an assertion failure which
I concluded is due to improper setting of ri_usesBulkModify. You can
reproduce it with these steps.

create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table foo (a int, b int) partition by list (a);
create table foo1 (like foo);
create foreign table ffoo1 partition of foo for values in (1) server
lb options (table_name 'foo1');
create table foo2 (like foo);
create foreign table ffoo2 partition of foo for values in (2) server
lb options (table_name 'foo2');
create function print_new_row() returns trigger language plpgsql as $$
begin raise notice '%', new; return new; end; $$;
create trigger ffoo1_br_trig before insert on ffoo1 for each row
execute function print_new_row();
copy foo from stdin csv;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

1,2
2,3
\.

NOTICE: (1,2)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

#0 0x00007f2d5e266337 in raise () from /lib64/libc.so.6
#1 0x00007f2d5e267a28 in abort () from /lib64/libc.so.6
#2 0x0000000000aafd5d in ExceptionalCondition
(conditionName=0x7f2d37b468d0 "!resultRelInfo->ri_usesBulkModify ||
resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn == NULL",
errorType=0x7f2d37b46680 "FailedAssertion",
fileName=0x7f2d37b4677f "postgres_fdw.c", lineNumber=1863) at
assert.c:67
#3 0x00007f2d37b3b0fe in postgresExecForeignInsert (estate=0x2456320,
resultRelInfo=0x23a8f58, slot=0x23a9480, planSlot=0x0) at
postgres_fdw.c:1862
#4 0x000000000066362a in CopyFrom (cstate=0x23a8d40) at copy.c:3331

The problem is that partition ffoo1's BR trigger prevents it from
using multi-insert, but its ResultRelInfo.ri_usesBulkModify is true,
which is copied from its parent. We should really check the same
things for a partition that CopyFrom() checks for the main target
relation (root parent) when deciding whether to use multi-insert.

Thnx, I added TAP-test on this problem> However instead of duplicating
the same logic to do so in two places

(CopyFrom and ExecInitPartitionInfo), I think it might be a good idea
to refactor the code to decide if multi-insert mode can be used for a
given relation by checking its properties and put it in some place
that both the main target relation and partitions need to invoke.
InitResultRelInfo() seems to be one such place.

+1

Also, it might be a good idea to use ri_usesBulkModify more generally
than only for foreign relations as the patch currently does, because I
can see that it can replace the variable insertMethod in CopyFrom().
Having both insertMethod and ri_usesBulkModify in each ResultRelInfo
seems confusing and bug-prone.

Finally, I suggest renaming ri_usesBulkModify to ri_usesMultiInsert to
reflect its scope.

Please check the attached delta patch that applies on top of v5 to see
what that would look like.

I merged your delta patch (see v6 in attachment) to the main patch.
Currently it seems more commitable than before.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v6-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v6-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From da6c4fe8262df58164e2c4ab80e085a019c9c6c1 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopyIn
* EndForeignCopyIn
* ExecForeignCopyIn

BeginForeignCopyIn and EndForeignCopyIn initialize and free
the CopyState of bulk COPY. The ExecForeignCopyIn routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +-
 contrib/postgres_fdw/postgres_fdw.c           | 143 +++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++
 doc/src/sgml/fdwhandler.sgml                  |  74 ++++
 src/backend/commands/copy.c                   | 398 +++++++++---------
 src/backend/commands/tablecmds.c              |   1 +
 src/backend/executor/execMain.c               |  53 +++
 src/backend/executor/execPartition.c          |  28 +-
 src/backend/replication/logical/worker.c      |   2 +-
 src/include/commands/copy.h                   |  11 +
 src/include/executor/executor.h               |   1 +
 src/include/foreign/fdwapi.h                  |  15 +
 src/include/nodes/execnodes.h                 |   9 +-
 15 files changed, 669 insertions(+), 218 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..de2638109b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8075,6 +8076,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8183,6 +8197,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..19cf119d08 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopyIn(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +542,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopyIn = postgresBeginForeignCopyIn;
+	routine->EndForeignCopyIn = postgresEndForeignCopyIn;
+	routine->ExecForeignCopyIn = postgresExecForeignCopyIn;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2051,6 +2063,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
+/*
+ *
+ * postgresBeginForeignCopyIn
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopyIn
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopyIn(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopyIn
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..aa0b26de77 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2193,6 +2193,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2293,6 +2310,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 74793035d7..e8fd91a7bc 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -795,6 +795,80 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopyIn(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopyIn</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopyIn</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+void
+EndForeignCopyIn(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *
+ExecForeignCopyIn(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a511..43cd8c011e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -128,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -359,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -595,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1124,8 +1113,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1507,6 +1496,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1542,6 +1532,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1868,20 +1863,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1920,8 +1920,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1930,6 +1931,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2016,7 +2022,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2039,7 +2047,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2055,19 +2063,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2154,6 +2165,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2185,24 +2222,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2495,53 +2521,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopyIn(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2715,12 +2752,11 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
+	bool		use_multi_insert;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2820,6 +2856,52 @@ CopyFrom(CopyState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately
+	 * for every tuple. However, there are a number of reasons why we might
+	 * not be able to do this.  We check some conditions below while some
+	 * other target relation properties are left for InitResultRelInfo() to
+	 * check, because they must also be checked for partitions which are
+	 * initialized later.
+	 */
+	if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
+	{
+		/*
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
+		 *
+		 * Note: It does not matter if any partitions have any volatile
+		 * default expressions as we use the defaults from the target of the
+		 * COPY command.
+		 */
+		use_multi_insert = false;
+	}
+	else if (contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile function
+		 * expressions in WHERE clause.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
+		 */
+		use_multi_insert = false;
+	}
+	else
+	{
+		/*
+		 * Looks okay to try multi-insert, but that may change once we
+		 * check few more properties in InitResultRelInfo().
+		 *
+		 * For partitioned tables, whether or not to use multi-insert depends
+		 * on the individual parition's properties which are also checked in
+		 * InitResultRelInfo().
+		 */
+		use_multi_insert = true;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2830,6 +2912,7 @@ CopyFrom(CopyState cstate)
 					  cstate->rel,
 					  1,		/* must match rel's position in range_table */
 					  NULL,
+					  use_multi_insert,
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
@@ -2854,11 +2937,6 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
-
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -2886,82 +2964,23 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
+	if (resultRelInfo->ri_usesMultiInsert)
+		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
+								estate, mycid, ti_options);
+
 	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
 	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
 	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
-		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate,
+																resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
 	}
 
 	/*
@@ -2970,7 +2989,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3013,7 +3032,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3021,7 +3040,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3080,24 +3098,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3149,7 +3157,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3168,9 +3176,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3241,7 +3246,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3316,11 +3321,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3346,14 +3348,19 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
@@ -3402,7 +3409,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cd989c95e5..2629ceb432 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1786,6 +1786,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 						  rel,
 						  0,	/* dummy rangetable index */
 						  NULL,
+						  false,
 						  0);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..e08c8b29df 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 							  resultRelation,
 							  resultRelationIndex,
 							  NULL,
+							  false,
 							  estate->es_instrument);
 			resultRelInfo++;
 		}
@@ -883,6 +884,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 								  resultRelDesc,
 								  resultRelIndex,
 								  NULL,
+								  false,
 								  estate->es_instrument);
 				resultRelInfo++;
 			}
@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 				  Relation resultRelationDesc,
 				  Index resultRelationIndex,
 				  Relation partition_root,
+				  bool use_multi_insert,
 				  int instrument_options)
 {
 	List	   *partition_check = NIL;
@@ -1345,6 +1348,55 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/*
+	 * If the caller has asked to use "multi-insert" mode, check if the
+	 * relation allows it and if it does set ri_usesMultiInsert to true.
+	 */
+	if (!use_multi_insert)
+	{
+		/* Caller didn't ask for it. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
+	{
+		/*
+		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+		 * triggers on the table. Such triggers might query the table we're
+		 * inserting into and act differently if the tuples that have already
+		 * been processed and prepared for insertion are not there.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+			 resultRelInfo->ri_TrigDesc != NULL &&
+			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
+	{
+		/*
+		 * For partitioned tables we can't support multi-inserts when there
+		 * are any statement level insert triggers. It might be possible to
+		 * allow partitioned tables with such triggers in the future, but for
+		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * and statement level insert triggers are on the same relation.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_FdwRoutine->ExecForeignCopyIn == NULL)
+	{
+		/*
+		 * For a foreign table, we can't support multi-inserts unless its FDW
+		 * provides the necessary COPY interface.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else
+	{
+		/* OK, caller can use multi-insert on this relation. */
+		resultRelInfo->ri_usesMultiInsert = true;
+	}
 }
 
 /*
@@ -1434,6 +1486,7 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 					  rel,
 					  0,		/* dummy rangetable index */
 					  NULL,
+					  false,
 					  estate->es_instrument);
 	estate->es_trig_target_relations =
 		lappend(estate->es_trig_target_relations, rInfo);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 79fcbd6b06..7b72a09fb7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -524,6 +524,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  partrel,
 					  node ? node->rootRelation : 1,
 					  rootrel,
+					  rootResultRelInfo->ri_usesMultiInsert,
 					  estate->es_instrument);
 
 	/*
@@ -937,9 +938,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert &&
+			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1121,10 +1127,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b576e342cb..9f9cf2dbdb 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -211,7 +211,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, false, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..72612bd5a6 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -189,6 +189,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Relation resultRelationDesc,
 							  Index resultRelationIndex,
 							  Relation partition_root,
+							  bool use_multi_insert,
 							  int instrument_options);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..11ea451fe4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopyIn_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopyIn_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopyIn_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopyIn_function BeginForeignCopyIn;
+	EndForeignCopyIn_function EndForeignCopyIn;
+	ExecForeignCopyIn_function ExecForeignCopyIn;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6f94..89ae9afaa4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,7 +489,14 @@ typedef struct ResultRelInfo
 	/* Additional information specific to partition tuple routing */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* For use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.17.1

#22Amit Langote
amitlangote09@gmail.com
In reply to: Andrey Lepikhov (#21)
2 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

On Fri, Aug 21, 2020 at 9:19 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 8/7/20 2:14 PM, Amit Langote wrote:

I was playing around with v5 and I noticed an assertion failure which
I concluded is due to improper setting of ri_usesBulkModify. You can
reproduce it with these steps.

create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table foo (a int, b int) partition by list (a);
create table foo1 (like foo);
create foreign table ffoo1 partition of foo for values in (1) server
lb options (table_name 'foo1');
create table foo2 (like foo);
create foreign table ffoo2 partition of foo for values in (2) server
lb options (table_name 'foo2');
create function print_new_row() returns trigger language plpgsql as $$
begin raise notice '%', new; return new; end; $$;
create trigger ffoo1_br_trig before insert on ffoo1 for each row
execute function print_new_row();
copy foo from stdin csv;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

1,2
2,3
\.

NOTICE: (1,2)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Thnx, I added TAP-test on this problem> However instead of duplicating
the same logic to do so in two places

Good call.

(CopyFrom and ExecInitPartitionInfo), I think it might be a good idea
to refactor the code to decide if multi-insert mode can be used for a
given relation by checking its properties and put it in some place
that both the main target relation and partitions need to invoke.
InitResultRelInfo() seems to be one such place.

+1

Also, it might be a good idea to use ri_usesBulkModify more generally
than only for foreign relations as the patch currently does, because I
can see that it can replace the variable insertMethod in CopyFrom().
Having both insertMethod and ri_usesBulkModify in each ResultRelInfo
seems confusing and bug-prone.

Finally, I suggest renaming ri_usesBulkModify to ri_usesMultiInsert to
reflect its scope.

Please check the attached delta patch that applies on top of v5 to see
what that would look like.

I merged your delta patch (see v6 in attachment) to the main patch.
Currently it seems more commitable than before.

Thanks for accepting the changes.

Actually, I was thinking maybe making the patch to replace
CopyInsertMethod enum by ri_usesMultiInsert separate from the rest
might be better as I can see it as independent refactoring. Attached
is how the division would look like.

I would

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v6-0001-Move-multi-insert-decision-logic-into-executor.patchapplication/octet-stream; name=v6-0001-Move-multi-insert-decision-logic-into-executor.patchDownload
From c14f3a62644dd26fcf6f3f95aab90fe9d875f6cc Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 24 Aug 2020 15:08:37 +0900
Subject: [PATCH v6 1/2] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
proprties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copy.c              | 186 +++++++++++--------------------
 src/backend/commands/tablecmds.c         |   1 +
 src/backend/executor/execMain.c          |  49 ++++++++
 src/backend/executor/execPartition.c     |   1 +
 src/backend/replication/logical/worker.c |   2 +-
 src/include/executor/executor.h          |   1 +
 src/include/nodes/execnodes.h            |   9 +-
 7 files changed, 128 insertions(+), 121 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..4e63926 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -86,16 +86,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
  * even though some fields are used in only some cases.
@@ -2715,12 +2705,11 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
+	bool		use_multi_insert;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2821,6 +2810,52 @@ CopyFrom(CopyState cstate)
 	}
 
 	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately
+	 * for every tuple. However, there are a number of reasons why we might
+	 * not be able to do this.  We check some conditions below while some
+	 * other target relation properties are left for InitResultRelInfo() to
+	 * check, because they must also be checked for partitions which are
+	 * initialized later.
+	 */
+	if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
+	{
+		/*
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
+		 *
+		 * Note: It does not matter if any partitions have any volatile
+		 * default expressions as we use the defaults from the target of the
+		 * COPY command.
+		 */
+		use_multi_insert = false;
+	}
+	else if (contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile function
+		 * expressions in WHERE clause.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
+		 */
+		use_multi_insert = false;
+	}
+	else
+	{
+		/*
+		 * Looks okay to try multi-insert, but that may change once we
+		 * check few more properties in InitResultRelInfo().
+		 *
+		 * For partitioned tables, whether or not to use multi-insert depends
+		 * on the individual parition's properties which are also checked in
+		 * InitResultRelInfo().
+		 */
+		use_multi_insert = true;
+	}
+
+	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
 	 * here that basically duplicated execUtils.c ...)
@@ -2830,6 +2865,7 @@ CopyFrom(CopyState cstate)
 					  cstate->rel,
 					  1,		/* must match rel's position in range_table */
 					  NULL,
+					  use_multi_insert,
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
@@ -2854,10 +2890,14 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -2886,83 +2926,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -2970,7 +2936,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3013,7 +2979,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3021,7 +2987,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3080,24 +3045,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3149,7 +3104,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3168,9 +3123,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3241,7 +3193,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3316,11 +3268,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3349,11 +3298,10 @@ CopyFrom(CopyState cstate)
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d2b15a3..70a1600 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1788,6 +1788,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 						  rel,
 						  0,	/* dummy rangetable index */
 						  NULL,
+						  false,
 						  0);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad..73f78f2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 							  resultRelation,
 							  resultRelationIndex,
 							  NULL,
+							  false,
 							  estate->es_instrument);
 			resultRelInfo++;
 		}
@@ -883,6 +884,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 								  resultRelDesc,
 								  resultRelIndex,
 								  NULL,
+								  false,
 								  estate->es_instrument);
 				resultRelInfo++;
 			}
@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 				  Relation resultRelationDesc,
 				  Index resultRelationIndex,
 				  Relation partition_root,
+				  bool use_multi_insert,
 				  int instrument_options)
 {
 	List	   *partition_check = NIL;
@@ -1345,6 +1348,51 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/*
+	 * If the caller has asked to use "multi-insert" mode, check if the
+	 * relation allows it and if it does set ri_usesMultiInsert to true.
+	 */
+	if (!use_multi_insert)
+	{
+		/* Caller didn't ask for it. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
+	{
+		/*
+		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+		 * triggers on the table. Such triggers might query the table we're
+		 * inserting into and act differently if the tuples that have already
+		 * been processed and prepared for insertion are not there.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+			 resultRelInfo->ri_TrigDesc != NULL &&
+			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
+	{
+		/*
+		 * For partitioned tables we can't support multi-inserts when there
+		 * are any statement level insert triggers. It might be possible to
+		 * allow partitioned tables with such triggers in the future, but for
+		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * and statement level insert triggers are on the same relation.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		/* Foreign tables don't support multi-inserts. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else
+	{
+		/* OK, caller can use multi-insert on this relation. */
+		resultRelInfo->ri_usesMultiInsert = true;
+	}
 }
 
 /*
@@ -1434,6 +1482,7 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 					  rel,
 					  0,		/* dummy rangetable index */
 					  NULL,
+					  false,
 					  estate->es_instrument);
 	estate->es_trig_target_relations =
 		lappend(estate->es_trig_target_relations, rInfo);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 79fcbd6..39048b8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -524,6 +524,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  partrel,
 					  node ? node->rootRelation : 1,
 					  rootrel,
+					  rootResultRelInfo->ri_usesMultiInsert,
 					  estate->es_instrument);
 
 	/*
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b576e34..9f9cf2d 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -211,7 +211,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, false, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117..72612bd 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -189,6 +189,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Relation resultRelationDesc,
 							  Index resultRelationIndex,
 							  Relation partition_root,
+							  bool use_multi_insert,
 							  int instrument_options);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6..89ae9af 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,7 +489,14 @@ typedef struct ResultRelInfo
 	/* Additional information specific to partition tuple routing */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* For use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
1.8.3.1

v6-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v6-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From cbea77ba85611cd0f5ceb8814f8d99fe271e9573 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH v6 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopyIn
* EndForeignCopyIn
* ExecForeignCopyIn

BeginForeignCopyIn and EndForeignCopyIn initialize and free
the CopyState of bulk COPY. The ExecForeignCopyIn routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                 |  60 +++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 +++++-
 contrib/postgres_fdw/postgres_fdw.c            | 143 ++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 +++++
 doc/src/sgml/fdwhandler.sgml                   |  74 +++++++++
 src/backend/commands/copy.c                    | 220 ++++++++++++++++---------
 src/backend/executor/execMain.c                |   8 +-
 src/backend/executor/execPartition.c           |  27 ++-
 src/include/commands/copy.h                    |  11 ++
 src/include/foreign/fdwapi.h                   |  15 ++
 11 files changed, 547 insertions(+), 103 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74..a37981f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1759,6 +1761,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * The statement text is appended to buf, and we also create an integer List
@@ -2062,6 +2078,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550..de26381 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8075,6 +8076,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8183,6 +8197,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53ca..19cf119 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopyIn(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +542,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopyIn = postgresBeginForeignCopyIn;
+	routine->EndForeignCopyIn = postgresEndForeignCopyIn;
+	routine->ExecForeignCopyIn = postgresExecForeignCopyIn;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2051,6 +2063,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
+/*
+ *
+ * postgresBeginForeignCopyIn
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopyIn(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopyIn
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopyIn(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopyIn
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopyIn(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410d..8fc5ff0 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 8397166..aa0b26d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2193,6 +2193,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2293,6 +2310,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 7479303..e8fd91a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -795,6 +795,80 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopyIn(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopyIn</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopyIn</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+void
+EndForeignCopyIn(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *
+ExecForeignCopyIn(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4e63926..aa0f482 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -118,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -349,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1114,8 +1113,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1497,6 +1496,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1532,6 +1532,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1858,20 +1863,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1910,8 +1920,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1920,6 +1931,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2006,7 +2022,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2029,7 +2047,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2045,19 +2063,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2144,6 +2165,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2175,24 +2222,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2485,53 +2521,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopyIn(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2894,10 +2941,16 @@ CopyFrom(CopyState cstate)
 	 * Init COPY into foreign table. Initialization of copying into foreign
 	 * partitions will be done later.
 	 */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate,
 																resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -3295,10 +3348,16 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(estate,
 														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
@@ -3350,7 +3409,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 73f78f2..0f16f0a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1383,9 +1383,13 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 		 */
 		resultRelInfo->ri_usesMultiInsert = false;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL)
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_FdwRoutine->ExecForeignCopyIn == NULL)
 	{
-		/* Foreign tables don't support multi-inserts. */
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		resultRelInfo->ri_usesMultiInsert = false;
 	}
 	else
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 39048b8..7b72a09 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -938,9 +938,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert &&
+			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopyIn(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1122,10 +1127,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopyIn != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopyIn(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..0830914 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556df..11ea451 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopyIn_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopyIn_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopyIn_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopyIn_function BeginForeignCopyIn;
+	EndForeignCopyIn_function EndForeignCopyIn;
+	ExecForeignCopyIn_function ExecForeignCopyIn;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
1.8.3.1

#23Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#22)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Mon, Aug 24, 2020 at 4:18 PM Amit Langote <amitlangote09@gmail.com> wrote:

I would

Oops, thought I'd continue writing, but hit send before actually doing
that. Please ignore.

I have some comments on v6, which I will share later this week.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#24Michael Paquier
michael@paquier.xyz
In reply to: Amit Langote (#23)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Mon, Aug 24, 2020 at 06:19:28PM +0900, Amit Langote wrote:

On Mon, Aug 24, 2020 at 4:18 PM Amit Langote <amitlangote09@gmail.com> wrote:

I would

Oops, thought I'd continue writing, but hit send before actually doing
that. Please ignore.

I have some comments on v6, which I will share later this week.

While on it, the CF bot is telling that the documentation of the patch
fails to compile. This needs to be fixed.
--
Michael

#25Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Michael Paquier (#24)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 9/7/20 12:26 PM, Michael Paquier wrote:

On Mon, Aug 24, 2020 at 06:19:28PM +0900, Amit Langote wrote:

On Mon, Aug 24, 2020 at 4:18 PM Amit Langote <amitlangote09@gmail.com> wrote:

I would

Oops, thought I'd continue writing, but hit send before actually doing
that. Please ignore.

I have some comments on v6, which I will share later this week.

While on it, the CF bot is telling that the documentation of the patch
fails to compile. This needs to be fixed.
--
Michael

v.7 (in attachment) fixes this problem.
I also accepted Amit's suggestion to rename all fdwapi routines such as
ForeignCopyIn to *ForeignCopy.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v7-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v7-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From db4ba1bac6a8d642dffd1b907dcc1dd082203fab Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +-
 contrib/postgres_fdw/postgres_fdw.c           | 143 +++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++
 doc/src/sgml/fdwhandler.sgml                  |  75 ++++
 src/backend/commands/copy.c                   | 398 +++++++++---------
 src/backend/commands/tablecmds.c              |   1 +
 src/backend/executor/execMain.c               |  53 +++
 src/backend/executor/execPartition.c          |  28 +-
 src/backend/replication/logical/worker.c      |   2 +-
 src/include/commands/copy.h                   |  11 +
 src/include/executor/executor.h               |   1 +
 src/include/foreign/fdwapi.h                  |  15 +
 src/include/nodes/execnodes.h                 |   9 +-
 15 files changed, 670 insertions(+), 218 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..de2638109b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8075,6 +8076,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8183,6 +8197,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..6f86b1fa36 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +542,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2051,6 +2063,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..aa0b26de77 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2193,6 +2193,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2293,6 +2310,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 74793035d7..7750cd4e05 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -795,6 +795,81 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopyIn(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopyIn</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopyIn</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopyIn(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *
+ExecForeignCopyIn(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopyIn</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a511..d9a1644f43 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -128,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -359,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -595,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1124,8 +1113,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1507,6 +1496,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1542,6 +1532,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1868,20 +1863,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1920,8 +1920,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1930,6 +1931,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2016,7 +2022,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2039,7 +2047,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2055,19 +2063,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2154,6 +2165,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2185,24 +2222,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2495,53 +2521,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2715,12 +2752,11 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
+	bool		use_multi_insert;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2820,6 +2856,52 @@ CopyFrom(CopyState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately
+	 * for every tuple. However, there are a number of reasons why we might
+	 * not be able to do this.  We check some conditions below while some
+	 * other target relation properties are left for InitResultRelInfo() to
+	 * check, because they must also be checked for partitions which are
+	 * initialized later.
+	 */
+	if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
+	{
+		/*
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
+		 *
+		 * Note: It does not matter if any partitions have any volatile
+		 * default expressions as we use the defaults from the target of the
+		 * COPY command.
+		 */
+		use_multi_insert = false;
+	}
+	else if (contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile function
+		 * expressions in WHERE clause.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
+		 */
+		use_multi_insert = false;
+	}
+	else
+	{
+		/*
+		 * Looks okay to try multi-insert, but that may change once we
+		 * check few more properties in InitResultRelInfo().
+		 *
+		 * For partitioned tables, whether or not to use multi-insert depends
+		 * on the individual parition's properties which are also checked in
+		 * InitResultRelInfo().
+		 */
+		use_multi_insert = true;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2830,6 +2912,7 @@ CopyFrom(CopyState cstate)
 					  cstate->rel,
 					  1,		/* must match rel's position in range_table */
 					  NULL,
+					  use_multi_insert,
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
@@ -2854,11 +2937,6 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
-
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -2886,82 +2964,23 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
+	if (resultRelInfo->ri_usesMultiInsert)
+		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
+								estate, mycid, ti_options);
+
 	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
 	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
 	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
-		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
 	}
 
 	/*
@@ -2970,7 +2989,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3013,7 +3032,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3021,7 +3040,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3080,24 +3098,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3149,7 +3157,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3168,9 +3176,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3241,7 +3246,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3316,11 +3321,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3346,14 +3348,19 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
@@ -3402,7 +3409,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cd989c95e5..2629ceb432 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1786,6 +1786,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 						  rel,
 						  0,	/* dummy rangetable index */
 						  NULL,
+						  false,
 						  0);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..b21bc2c4f4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 							  resultRelation,
 							  resultRelationIndex,
 							  NULL,
+							  false,
 							  estate->es_instrument);
 			resultRelInfo++;
 		}
@@ -883,6 +884,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 								  resultRelDesc,
 								  resultRelIndex,
 								  NULL,
+								  false,
 								  estate->es_instrument);
 				resultRelInfo++;
 			}
@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 				  Relation resultRelationDesc,
 				  Index resultRelationIndex,
 				  Relation partition_root,
+				  bool use_multi_insert,
 				  int instrument_options)
 {
 	List	   *partition_check = NIL;
@@ -1345,6 +1348,55 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/*
+	 * If the caller has asked to use "multi-insert" mode, check if the
+	 * relation allows it and if it does set ri_usesMultiInsert to true.
+	 */
+	if (!use_multi_insert)
+	{
+		/* Caller didn't ask for it. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
+	{
+		/*
+		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+		 * triggers on the table. Such triggers might query the table we're
+		 * inserting into and act differently if the tuples that have already
+		 * been processed and prepared for insertion are not there.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+			 resultRelInfo->ri_TrigDesc != NULL &&
+			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
+	{
+		/*
+		 * For partitioned tables we can't support multi-inserts when there
+		 * are any statement level insert triggers. It might be possible to
+		 * allow partitioned tables with such triggers in the future, but for
+		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * and statement level insert triggers are on the same relation.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_FdwRoutine->ExecForeignCopy == NULL)
+	{
+		/*
+		 * For a foreign table, we can't support multi-inserts unless its FDW
+		 * provides the necessary COPY interface.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else
+	{
+		/* OK, caller can use multi-insert on this relation. */
+		resultRelInfo->ri_usesMultiInsert = true;
+	}
 }
 
 /*
@@ -1434,6 +1486,7 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 					  rel,
 					  0,		/* dummy rangetable index */
 					  NULL,
+					  false,
 					  estate->es_instrument);
 	estate->es_trig_target_relations =
 		lappend(estate->es_trig_target_relations, rInfo);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 79fcbd6b06..f7f7c59fae 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -524,6 +524,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  partrel,
 					  node ? node->rootRelation : 1,
 					  rootrel,
+					  rootResultRelInfo->ri_usesMultiInsert,
 					  estate->es_instrument);
 
 	/*
@@ -937,9 +938,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert &&
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1121,10 +1127,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b576e342cb..9f9cf2dbdb 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -211,7 +211,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, false, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..72612bd5a6 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -189,6 +189,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Relation resultRelationDesc,
 							  Index resultRelationIndex,
 							  Relation partition_root,
+							  bool use_multi_insert,
 							  int instrument_options);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..a5553f1777 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopyIn_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopyIn_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopyIn_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopyIn_function BeginForeignCopy;
+	EndForeignCopyIn_function EndForeignCopy;
+	ExecForeignCopyIn_function ExecForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6f94..89ae9afaa4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,7 +489,14 @@ typedef struct ResultRelInfo
 	/* Additional information specific to partition tuple routing */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* For use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.25.1

#26Amit Langote
amitlangote09@gmail.com
In reply to: Andrey V. Lepikhov (#25)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

On Mon, Sep 7, 2020 at 7:31 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 9/7/20 12:26 PM, Michael Paquier wrote:

While on it, the CF bot is telling that the documentation of the patch
fails to compile. This needs to be fixed.
--
Michael

v.7 (in attachment) fixes this problem.
I also accepted Amit's suggestion to rename all fdwapi routines such as
ForeignCopyIn to *ForeignCopy.

Any thoughts on the taking out the refactoring changes out of the main
patch as I suggested?

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#27Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Amit Langote (#26)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi,

I've started doing a review of v7 yesterday.

On 2020-09-08 10:34, Amit Langote wrote:

On Mon, Sep 7, 2020 at 7:31 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

v.7 (in attachment) fixes this problem.
I also accepted Amit's suggestion to rename all fdwapi routines such
as
ForeignCopyIn to *ForeignCopy.

It seems that naming is quite inconsistent now:

+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopyIn_function BeginForeignCopy;
+	EndForeignCopyIn_function EndForeignCopy;
+	ExecForeignCopyIn_function ExecForeignCopy;

You get rid of this 'In' in the function names, but the types are still
with it:

+typedef void (*BeginForeignCopyIn_function) (ModifyTableState *mtstate,
+		ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopyIn_function) (EState *estate,
+		ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopyIn_function) (ResultRelInfo *rinfo,
+		TupleTableSlot **slots,
+		int nslots);

Also docs refer to old function names:

+void
+BeginForeignCopyIn(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);

I think that it'd be better to choose either of these two naming schemes
and use it everywhere for consistency.

Any thoughts on the taking out the refactoring changes out of the main
patch as I suggested?

+1 for splitting the patch. It was rather difficult for me to
distinguish changes required by COPY via postgres_fdw from this
refactoring.

Another ambiguous part of the refactoring was in changing
InitResultRelInfo() arguments:

@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
Relation resultRelationDesc,
Index resultRelationIndex,
Relation partition_root,
+ bool use_multi_insert,
int instrument_options)

Why do we need to pass this use_multi_insert flag here? Would it be
better to set resultRelInfo->ri_usesMultiInsert in the
InitResultRelInfo() unconditionally like it is done for
ri_usesFdwDirectModify? And after that it will be up to the caller
whether to use multi-insert or not based on their own circumstances.
Otherwise now we have a flag to indicate that we want to check for
another flag, while this check doesn't look costly.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#28Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#26)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 9/8/20 12:34 PM, Amit Langote wrote:

Hi Andrey,

On Mon, Sep 7, 2020 at 7:31 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 9/7/20 12:26 PM, Michael Paquier wrote:

While on it, the CF bot is telling that the documentation of the patch
fails to compile. This needs to be fixed.
--
Michael

v.7 (in attachment) fixes this problem.
I also accepted Amit's suggestion to rename all fdwapi routines such as
ForeignCopyIn to *ForeignCopy.

Any thoughts on the taking out the refactoring changes out of the main
patch as I suggested?

Sorry I thought you asked to ignore your previous letter. I'll look into
this patch set shortly.

--
regards,
Andrey Lepikhov
Postgres Professional

#29Amit Langote
amitlangote09@gmail.com
In reply to: Alexey Kondratov (#27)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Alexey,

On Tue, Sep 8, 2020 at 6:29 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:

On 2020-09-08 10:34, Amit Langote wrote:

Any thoughts on the taking out the refactoring changes out of the main
patch as I suggested?

+1 for splitting the patch. It was rather difficult for me to
distinguish changes required by COPY via postgres_fdw from this
refactoring.

Another ambiguous part of the refactoring was in changing
InitResultRelInfo() arguments:

@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
Relation resultRelationDesc,
Index resultRelationIndex,
Relation partition_root,
+                                 bool use_multi_insert,
int instrument_options)

Why do we need to pass this use_multi_insert flag here? Would it be
better to set resultRelInfo->ri_usesMultiInsert in the
InitResultRelInfo() unconditionally like it is done for
ri_usesFdwDirectModify? And after that it will be up to the caller
whether to use multi-insert or not based on their own circumstances.
Otherwise now we have a flag to indicate that we want to check for
another flag, while this check doesn't look costly.

Hmm, I think having two flags seems confusing and bug prone,
especially if you consider partitions. For example, if a partition's
ri_usesMultiInsert is true, but CopyFrom()'s local flag is false, then
execPartition.c: ExecInitPartitionInfo() would wrongly perform
BeginForeignCopy() based on only ri_usesMultiInsert, because it
wouldn't know CopyFrom()'s local flag. Am I missing something?

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#30Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Amit Langote (#29)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 2020-09-08 17:00, Amit Langote wrote:

Hi Alexey,

On Tue, Sep 8, 2020 at 6:29 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:

On 2020-09-08 10:34, Amit Langote wrote:

Any thoughts on the taking out the refactoring changes out of the main
patch as I suggested?

+1 for splitting the patch. It was rather difficult for me to
distinguish changes required by COPY via postgres_fdw from this
refactoring.

Another ambiguous part of the refactoring was in changing
InitResultRelInfo() arguments:

@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
Relation resultRelationDesc,
Index resultRelationIndex,
Relation partition_root,
+                                 bool use_multi_insert,
int instrument_options)

Why do we need to pass this use_multi_insert flag here? Would it be
better to set resultRelInfo->ri_usesMultiInsert in the
InitResultRelInfo() unconditionally like it is done for
ri_usesFdwDirectModify? And after that it will be up to the caller
whether to use multi-insert or not based on their own circumstances.
Otherwise now we have a flag to indicate that we want to check for
another flag, while this check doesn't look costly.

Hmm, I think having two flags seems confusing and bug prone,
especially if you consider partitions. For example, if a partition's
ri_usesMultiInsert is true, but CopyFrom()'s local flag is false, then
execPartition.c: ExecInitPartitionInfo() would wrongly perform
BeginForeignCopy() based on only ri_usesMultiInsert, because it
wouldn't know CopyFrom()'s local flag. Am I missing something?

No, you're right. If someone want to share a state and use ResultRelInfo
(RRI) for that purpose, then it's fine, but CopyFrom() may simply
override RRI->ri_usesMultiInsert if needed and pass this RRI further.

This is how it's done for RRI->ri_usesFdwDirectModify.
InitResultRelInfo() initializes it to false and then
ExecInitModifyTable() changes the flag if needed.

Probably this is just a matter of personal choice, but for me the
current implementation with additional argument in InitResultRelInfo()
doesn't look completely right. Maybe because a caller now should pass an
additional argument (as false) even if it doesn't care about
ri_usesMultiInsert at all. It also adds additional complexity and feels
like abstractions leaking.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#31Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Alexey Kondratov (#30)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 9/8/20 8:34 PM, Alexey Kondratov wrote:

On 2020-09-08 17:00, Amit Langote wrote:

<a.kondratov@postgrespro.ru> wrote:

On 2020-09-08 10:34, Amit Langote wrote:
Another ambiguous part of the refactoring was in changing
InitResultRelInfo() arguments:

@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
                                  Relation resultRelationDesc,
                                  Index resultRelationIndex,
                                  Relation partition_root,
+                                 bool use_multi_insert,
                                  int instrument_options)

Why do we need to pass this use_multi_insert flag here? Would it be
better to set resultRelInfo->ri_usesMultiInsert in the
InitResultRelInfo() unconditionally like it is done for
ri_usesFdwDirectModify? And after that it will be up to the caller
whether to use multi-insert or not based on their own circumstances.
Otherwise now we have a flag to indicate that we want to check for
another flag, while this check doesn't look costly.

Hmm, I think having two flags seems confusing and bug prone,
especially if you consider partitions.  For example, if a partition's
ri_usesMultiInsert is true, but CopyFrom()'s local flag is false, then
execPartition.c: ExecInitPartitionInfo() would wrongly perform
BeginForeignCopy() based on only ri_usesMultiInsert, because it
wouldn't know CopyFrom()'s local flag.  Am I missing something?

No, you're right. If someone want to share a state and use ResultRelInfo
(RRI) for that purpose, then it's fine, but CopyFrom() may simply
override RRI->ri_usesMultiInsert if needed and pass this RRI further.

This is how it's done for RRI->ri_usesFdwDirectModify.
InitResultRelInfo() initializes it to false and then
ExecInitModifyTable() changes the flag if needed.

Probably this is just a matter of personal choice, but for me the
current implementation with additional argument in InitResultRelInfo()
doesn't look completely right. Maybe because a caller now should pass an
additional argument (as false) even if it doesn't care about
ri_usesMultiInsert at all. It also adds additional complexity and feels
like abstractions leaking.

I didn't feel what the problem was and prepared a patch version
according to Alexey's suggestion (see Alternate.patch).
This does not seem very convenient and will lead to errors in the
future. So, I agree with Amit.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

alternative.patchtext/x-patch; charset=UTF-8; name=alternative.patchDownload
From 73705843d300ad1016384e6cb8893c80246372a6 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 24 Aug 2020 15:08:37 +0900
Subject: [PATCH 1/2] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
proprties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copy.c              | 189 +++++++++--------------
 src/backend/commands/tablecmds.c         |   1 +
 src/backend/executor/execMain.c          |  40 +++++
 src/backend/executor/execPartition.c     |   7 +
 src/backend/replication/logical/worker.c |   1 +
 src/include/nodes/execnodes.h            |   9 +-
 6 files changed, 127 insertions(+), 120 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a511..94f6e71a94 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -2715,12 +2705,10 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2833,6 +2821,57 @@ CopyFrom(CopyState cstate)
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately
+	 * for every tuple. However, there are a number of reasons why we might
+	 * not be able to do this.  We check some conditions below while some
+	 * other target relation properties are checked in InitResultRelInfo().
+	 * Partition initialization will use result of this check implicitly as
+	 * the ri_usesMultiInsert value of the parent relation.
+	 */
+	if (!target_resultRelInfo->ri_usesMultiInsert)
+	{
+		/*
+		 * Do nothing. Can't allow multi-insert mode if previous conditions
+		 * checking in the InitResultRelInfo() routine disallow this.
+		 */
+	}
+	else if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
+	{
+		/*
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
+		 *
+		 * Note: It does not matter if any partitions have any volatile
+		 * default expressions as we use the defaults from the target of the
+		 * COPY command.
+		 */
+		target_resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile function
+		 * expressions in WHERE clause.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
+		 */
+		target_resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else
+	{
+		/*
+		 * Looks okay to try multi-insert.
+		 *
+		 * For partitioned tables, whether or not to use multi-insert depends
+		 * on the individual parition's properties which are also checked in
+		 * InitResultRelInfo().
+		 */
+	}
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -2854,10 +2893,14 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -2886,83 +2929,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -2970,7 +2939,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3013,7 +2982,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3021,7 +2990,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3080,24 +3048,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3149,7 +3107,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3168,9 +3126,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3241,7 +3196,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3316,11 +3271,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3349,11 +3301,10 @@ CopyFrom(CopyState cstate)
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3e57c7f9e1..571f209429 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1800,6 +1800,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 						  0,	/* dummy rangetable index */
 						  NULL,
 						  0);
+		resultRelInfo->ri_usesMultiInsert = false;
 		resultRelInfo++;
 	}
 	estate->es_result_relations = resultRelInfos;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..11ae3e1a82 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -852,6 +852,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 							  resultRelationIndex,
 							  NULL,
 							  estate->es_instrument);
+			resultRelInfo->ri_usesMultiInsert = false;
 			resultRelInfo++;
 		}
 		estate->es_result_relations = resultRelInfos;
@@ -884,6 +885,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 								  resultRelIndex,
 								  NULL,
 								  estate->es_instrument);
+				resultRelInfo->ri_usesMultiInsert = false;
 				resultRelInfo++;
 			}
 
@@ -1345,6 +1347,43 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/* Check if the relation allows to use "multi-insert" mode. */
+	if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
+	{
+		/*
+		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+		 * triggers on the table. Such triggers might query the table we're
+		 * inserting into and act differently if the tuples that have already
+		 * been processed and prepared for insertion are not there.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+			 resultRelInfo->ri_TrigDesc != NULL &&
+			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
+	{
+		/*
+		 * For partitioned tables we can't support multi-inserts when there
+		 * are any statement level insert triggers. It might be possible to
+		 * allow partitioned tables with such triggers in the future, but for
+		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * and statement level insert triggers are on the same relation.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		/* Foreign tables don't support multi-inserts. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else
+	{
+		/* OK, caller can use multi-insert on this relation. */
+		resultRelInfo->ri_usesMultiInsert = true;
+	}
 }
 
 /*
@@ -1435,6 +1474,7 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 					  0,		/* dummy rangetable index */
 					  NULL,
 					  estate->es_instrument);
+	rInfo->ri_usesMultiInsert = false;
 	estate->es_trig_target_relations =
 		lappend(estate->es_trig_target_relations, rInfo);
 	MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bd2ea25804..db54107caa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -583,6 +583,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	/*
+	 * Use multi-insert mode if the initial condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert = (leaf_part_rri->ri_usesMultiInsert &&
+		rootResultRelInfo->ri_usesMultiInsert) ? true : false;
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index c37aafed0d..9858aca6ef 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -358,6 +358,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	resultRelInfo->ri_usesMultiInsert = false;
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6f94..89ae9afaa4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,7 +489,14 @@ typedef struct ResultRelInfo
 	/* Additional information specific to partition tuple routing */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* For use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.25.1

#32Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Alexey Kondratov (#30)
2 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Version 8 split into two patches (in accordance with Amit suggestion).
Also I eliminate naming inconsistency (thanks to Alexey).
Based on master, f481d28232.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v8-0001-Move-multi-insert-decision-logic-into-executor.patchtext/x-patch; charset=UTF-8; name=v8-0001-Move-multi-insert-decision-logic-into-executor.patchDownload
From 21b11f4ec0bec71bc7226014ef15c58dee9002da Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 24 Aug 2020 15:08:37 +0900
Subject: [PATCH 1/2] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
proprties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copy.c              | 186 ++++++++---------------
 src/backend/commands/tablecmds.c         |   1 +
 src/backend/executor/execMain.c          |  49 ++++++
 src/backend/executor/execPartition.c     |   3 +-
 src/backend/replication/logical/worker.c |   2 +-
 src/include/executor/executor.h          |   1 +
 src/include/nodes/execnodes.h            |   9 +-
 7 files changed, 129 insertions(+), 122 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a511..4e63926cb7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -2715,12 +2705,11 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
+	bool		use_multi_insert;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2820,6 +2809,52 @@ CopyFrom(CopyState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately
+	 * for every tuple. However, there are a number of reasons why we might
+	 * not be able to do this.  We check some conditions below while some
+	 * other target relation properties are left for InitResultRelInfo() to
+	 * check, because they must also be checked for partitions which are
+	 * initialized later.
+	 */
+	if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
+	{
+		/*
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
+		 *
+		 * Note: It does not matter if any partitions have any volatile
+		 * default expressions as we use the defaults from the target of the
+		 * COPY command.
+		 */
+		use_multi_insert = false;
+	}
+	else if (contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile function
+		 * expressions in WHERE clause.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
+		 */
+		use_multi_insert = false;
+	}
+	else
+	{
+		/*
+		 * Looks okay to try multi-insert, but that may change once we
+		 * check few more properties in InitResultRelInfo().
+		 *
+		 * For partitioned tables, whether or not to use multi-insert depends
+		 * on the individual parition's properties which are also checked in
+		 * InitResultRelInfo().
+		 */
+		use_multi_insert = true;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2830,6 +2865,7 @@ CopyFrom(CopyState cstate)
 					  cstate->rel,
 					  1,		/* must match rel's position in range_table */
 					  NULL,
+					  use_multi_insert,
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
@@ -2854,10 +2890,14 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -2886,83 +2926,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -2970,7 +2936,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3013,7 +2979,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3021,7 +2987,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3080,24 +3045,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3149,7 +3104,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3168,9 +3123,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3241,7 +3193,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3316,11 +3268,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3349,11 +3298,10 @@ CopyFrom(CopyState cstate)
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3e57c7f9e1..28015a55cb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1799,6 +1799,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 						  rel,
 						  0,	/* dummy rangetable index */
 						  NULL,
+						  false,
 						  0);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..73f78f287a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 							  resultRelation,
 							  resultRelationIndex,
 							  NULL,
+							  false,
 							  estate->es_instrument);
 			resultRelInfo++;
 		}
@@ -883,6 +884,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 								  resultRelDesc,
 								  resultRelIndex,
 								  NULL,
+								  false,
 								  estate->es_instrument);
 				resultRelInfo++;
 			}
@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 				  Relation resultRelationDesc,
 				  Index resultRelationIndex,
 				  Relation partition_root,
+				  bool use_multi_insert,
 				  int instrument_options)
 {
 	List	   *partition_check = NIL;
@@ -1345,6 +1348,51 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/*
+	 * If the caller has asked to use "multi-insert" mode, check if the
+	 * relation allows it and if it does set ri_usesMultiInsert to true.
+	 */
+	if (!use_multi_insert)
+	{
+		/* Caller didn't ask for it. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
+	{
+		/*
+		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+		 * triggers on the table. Such triggers might query the table we're
+		 * inserting into and act differently if the tuples that have already
+		 * been processed and prepared for insertion are not there.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+			 resultRelInfo->ri_TrigDesc != NULL &&
+			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
+	{
+		/*
+		 * For partitioned tables we can't support multi-inserts when there
+		 * are any statement level insert triggers. It might be possible to
+		 * allow partitioned tables with such triggers in the future, but for
+		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * and statement level insert triggers are on the same relation.
+		 */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		/* Foreign tables don't support multi-inserts. */
+		resultRelInfo->ri_usesMultiInsert = false;
+	}
+	else
+	{
+		/* OK, caller can use multi-insert on this relation. */
+		resultRelInfo->ri_usesMultiInsert = true;
+	}
 }
 
 /*
@@ -1434,6 +1482,7 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 					  rel,
 					  0,		/* dummy rangetable index */
 					  NULL,
+					  false,
 					  estate->es_instrument);
 	estate->es_trig_target_relations =
 		lappend(estate->es_trig_target_relations, rInfo);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bd2ea25804..8d01f5098d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -581,6 +581,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  partrel,
 					  node ? node->rootRelation : 1,
 					  rootrel,
+					  rootResultRelInfo->ri_usesMultiInsert,
 					  estate->es_instrument);
 
 	/*
@@ -1142,7 +1143,7 @@ ExecInitPartitionDispatchInfo(EState *estate,
 	{
 		ResultRelInfo *rri = makeNode(ResultRelInfo);
 
-		InitResultRelInfo(rri, rel, 1, proute->partition_root, 0);
+		InitResultRelInfo(rri, rel, 1, proute->partition_root, false, 0);
 		proute->nonleaf_partitions[dispatchidx] = rri;
 	}
 	else
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index c37aafed0d..0de8914b2a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -357,7 +357,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, false, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..72612bd5a6 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -189,6 +189,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Relation resultRelationDesc,
 							  Index resultRelationIndex,
 							  Relation partition_root,
+							  bool use_multi_insert,
 							  int instrument_options);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6f94..89ae9afaa4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,7 +489,14 @@ typedef struct ResultRelInfo
 	/* Additional information specific to partition tuple routing */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* For use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.25.1

v8-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v8-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From b025e713776a662b72b91b5921452d534ab0087f Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++-
 contrib/postgres_fdw/postgres_fdw.c           | 143 ++++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++
 doc/src/sgml/fdwhandler.sgml                  |  75 ++++++
 src/backend/commands/copy.c                   | 220 +++++++++++-------
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  27 ++-
 src/include/commands/copy.h                   |  11 +
 src/include/foreign/fdwapi.h                  |  15 ++
 11 files changed, 548 insertions(+), 103 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 84bc0ee381..5206814f10 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8084,8 +8084,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8096,6 +8097,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8204,6 +8218,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..9685e731e0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +542,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2050,6 +2062,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index d452d06343..1a56432f0f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2213,6 +2213,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2313,6 +2330,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127212..81728945ea 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,81 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4e63926cb7..62dca2abff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -118,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -349,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1114,8 +1113,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1497,6 +1496,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1532,6 +1532,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1858,20 +1863,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1910,8 +1920,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1920,6 +1931,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2006,7 +2022,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2029,7 +2047,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2045,19 +2063,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2144,6 +2165,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2175,24 +2222,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2485,53 +2521,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2894,10 +2941,16 @@ CopyFrom(CopyState cstate)
 	 * Init COPY into foreign table. Initialization of copying into foreign
 	 * partitions will be done later.
 	 */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
 																resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -3295,10 +3348,16 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
 														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
@@ -3350,7 +3409,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 73f78f287a..8d31cd0f56 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1383,9 +1383,13 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 		 */
 		resultRelInfo->ri_usesMultiInsert = false;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL)
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_FdwRoutine->ExecForeignCopy == NULL)
 	{
-		/* Foreign tables don't support multi-inserts. */
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		resultRelInfo->ri_usesMultiInsert = false;
 	}
 	else
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 8d01f5098d..686d20362d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -995,9 +995,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert &&
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1199,10 +1204,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..e932bdf2f4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.25.1

#33Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Andrey V. Lepikhov (#31)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 2020-09-09 11:45, Andrey V. Lepikhov wrote:

On 9/8/20 8:34 PM, Alexey Kondratov wrote:

On 2020-09-08 17:00, Amit Langote wrote:

<a.kondratov@postgrespro.ru> wrote:

On 2020-09-08 10:34, Amit Langote wrote:
Another ambiguous part of the refactoring was in changing
InitResultRelInfo() arguments:

@@ -1278,6 +1280,7 @@ InitResultRelInfo(ResultRelInfo 
*resultRelInfo,
                                  Relation resultRelationDesc,
                                  Index resultRelationIndex,
                                  Relation partition_root,
+                                 bool use_multi_insert,
                                  int instrument_options)

Why do we need to pass this use_multi_insert flag here? Would it be
better to set resultRelInfo->ri_usesMultiInsert in the
InitResultRelInfo() unconditionally like it is done for
ri_usesFdwDirectModify? And after that it will be up to the caller
whether to use multi-insert or not based on their own circumstances.
Otherwise now we have a flag to indicate that we want to check for
another flag, while this check doesn't look costly.

Hmm, I think having two flags seems confusing and bug prone,
especially if you consider partitions.  For example, if a partition's
ri_usesMultiInsert is true, but CopyFrom()'s local flag is false,
then
execPartition.c: ExecInitPartitionInfo() would wrongly perform
BeginForeignCopy() based on only ri_usesMultiInsert, because it
wouldn't know CopyFrom()'s local flag.  Am I missing something?

No, you're right. If someone want to share a state and use
ResultRelInfo (RRI) for that purpose, then it's fine, but CopyFrom()
may simply override RRI->ri_usesMultiInsert if needed and pass this
RRI further.

This is how it's done for RRI->ri_usesFdwDirectModify.
InitResultRelInfo() initializes it to false and then
ExecInitModifyTable() changes the flag if needed.

Probably this is just a matter of personal choice, but for me the
current implementation with additional argument in InitResultRelInfo()
doesn't look completely right. Maybe because a caller now should pass
an additional argument (as false) even if it doesn't care about
ri_usesMultiInsert at all. It also adds additional complexity and
feels like abstractions leaking.

I didn't feel what the problem was and prepared a patch version
according to Alexey's suggestion (see Alternate.patch).

Yes, that's very close to what I've meant.

+	leaf_part_rri->ri_usesMultiInsert = (leaf_part_rri->ri_usesMultiInsert 
&&
+		rootResultRelInfo->ri_usesMultiInsert) ? true : false;

This could be just:

+	leaf_part_rri->ri_usesMultiInsert = (leaf_part_rri->ri_usesMultiInsert 
&&
+		rootResultRelInfo->ri_usesMultiInsert);

This does not seem very convenient and will lead to errors in the
future. So, I agree with Amit.

And InitResultRelInfo() may set ri_usesMultiInsert to false by default,
since it's used only by COPY now. Then you won't need this in several
places:

+ resultRelInfo->ri_usesMultiInsert = false;

While the logic of turning multi-insert on with all the validations
required could be factored out of InitResultRelInfo() to a separate
routine.

Anyway, I don't insist at all and think it's fine to stick to the
original v7's logic.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

#34Amit Langote
amitlangote09@gmail.com
In reply to: Alexey Kondratov (#33)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Wed, Sep 9, 2020 at 6:42 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:

On 2020-09-09 11:45, Andrey V. Lepikhov wrote:

This does not seem very convenient and will lead to errors in the
future. So, I agree with Amit.

And InitResultRelInfo() may set ri_usesMultiInsert to false by default,
since it's used only by COPY now. Then you won't need this in several
places:

+ resultRelInfo->ri_usesMultiInsert = false;

While the logic of turning multi-insert on with all the validations
required could be factored out of InitResultRelInfo() to a separate
routine.

Interesting idea. Maybe better to have a separate routine like Alexey says.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#35Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#34)
4 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 9/9/20 5:51 PM, Amit Langote wrote:

On Wed, Sep 9, 2020 at 6:42 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:

On 2020-09-09 11:45, Andrey V. Lepikhov wrote:

This does not seem very convenient and will lead to errors in the
future. So, I agree with Amit.

And InitResultRelInfo() may set ri_usesMultiInsert to false by default,
since it's used only by COPY now. Then you won't need this in several
places:

+ resultRelInfo->ri_usesMultiInsert = false;

While the logic of turning multi-insert on with all the validations
required could be factored out of InitResultRelInfo() to a separate
routine.

Interesting idea. Maybe better to have a separate routine like Alexey says.

Ok. I rewrited the patch 0001 with the Alexey suggestion.
Patch 0002... required minor changes (new version see in attachment).

Also I added some optimization (see 0003 and 0004 patches). Here we
execute 'COPY .. FROM STDIN' at foreign server only once, in the
BeginForeignCopy routine. It is a proof-of-concept patches.

Also I see that error messages processing needs to be rewritten. Unlike
the INSERT operation applied to each row, here we find out copy errors
only after sending the END of copy. Currently implementations 0002 and
0004 provide uninformative error messages for some cases.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v9-0001-Move-multi-insert-decision-logic-into-executor.patchtext/x-patch; charset=UTF-8; name=v9-0001-Move-multi-insert-decision-logic-into-executor.patchDownload
From 2053ac530db87ae4617aa953142c447e0b27e3a2 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 24 Aug 2020 15:08:37 +0900
Subject: [PATCH 1/4] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
proprties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copy.c          | 190 ++++++++++-----------------
 src/backend/executor/execMain.c      |   3 +
 src/backend/executor/execPartition.c |  47 +++++++
 src/include/executor/execPartition.h |   2 +
 src/include/nodes/execnodes.h        |   9 +-
 5 files changed, 131 insertions(+), 120 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a511..2119db4213 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -2715,12 +2705,10 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2833,6 +2821,58 @@ CopyFrom(CopyState cstate)
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately
+	 * for every tuple. However, there are a number of reasons why we might
+	 * not be able to do this.  We check some conditions below while some
+	 * other target relation properties are checked in InitResultRelInfo().
+	 * Partition initialization will use result of this check implicitly as
+	 * the ri_usesMultiInsert value of the parent relation.
+	 */
+	if (!checkMultiInsertMode(target_resultRelInfo, NULL))
+	{
+		/*
+		 * Do nothing. Can't allow multi-insert mode if previous conditions
+		 * checking disallow this.
+		 */
+	}
+	else if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
+	{
+		/*
+		 * Can't support bufferization of copy into foreign tables without any
+		 * defined columns or if there are any volatile default expressions in the
+		 * table. Similarly to the trigger case above, such expressions may query
+		 * the table we're inserting into.
+		 *
+		 * Note: It does not matter if any partitions have any volatile
+		 * default expressions as we use the defaults from the target of the
+		 * COPY command.
+		 */
+	}
+	else if (contain_volatile_functions(cstate->whereClause))
+	{
+		/*
+		 * Can't support multi-inserts if there are any volatile function
+		 * expressions in WHERE clause.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
+		 */
+	}
+	else
+	{
+		/*
+		 * Looks okay to try multi-insert.
+		 *
+		 * For partitioned tables, whether or not to use multi-insert depends
+		 * on the individual parition's properties which are also checked in
+		 * InitResultRelInfo().
+		 */
+		target_resultRelInfo->ri_usesMultiInsert = true;
+	}
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -2854,10 +2894,14 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -2886,83 +2930,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -2970,7 +2940,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3013,7 +2983,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3021,7 +2991,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3080,24 +3049,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3149,7 +3108,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3168,9 +3127,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3241,7 +3197,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3316,11 +3272,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3349,11 +3302,10 @@ CopyFrom(CopyState cstate)
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad6f3..12ee7f2b61 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1345,6 +1345,9 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/* Define multi-insert mode possibility later if needed */
+	resultRelInfo->ri_usesMultiInsert = false;
 }
 
 /*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bd2ea25804..baaa0f61fa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -548,6 +548,46 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 	}
 }
 
+bool
+checkMultiInsertMode(const ResultRelInfo *rri, const ResultRelInfo *parent)
+{
+	Assert(rri->ri_usesMultiInsert == false);
+
+	if (parent && !parent->ri_usesMultiInsert)
+		return false;
+
+	/* Check if the relation allows to use "multi-insert" mode. */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		/*
+		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+		 * triggers on the table. Such triggers might query the table we're
+		 * inserting into and act differently if the tuples that have already
+		 * been processed and prepared for insertion are not there.
+		 */
+		return false;
+
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		/*
+		 * For partitioned tables we can't support multi-inserts when there
+		 * are any statement level insert triggers. It might be possible to
+		 * allow partitioned tables with such triggers in the future, but for
+		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * and statement level insert triggers are on the same relation.
+		 */
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL)
+		/* Foreign tables don't support multi-inserts. */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
 /*
  * ExecInitPartitionInfo
  *		Lock the partition and initialize ResultRelInfo.  Also setup other
@@ -583,6 +623,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	/*
+	 * Use multi-insert mode if the condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert =
+						checkMultiInsertMode(leaf_part_rri, rootResultRelInfo);
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 6d1b722198..895bcd01c6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -145,6 +145,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b42dd6f94..89ae9afaa4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,7 +489,14 @@ typedef struct ResultRelInfo
 	/* Additional information specific to partition tuple routing */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* For use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.25.1

v9-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v9-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 3db62efd9b62581cab35189e83fccd9b6b7aebfc Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 10 Sep 2020 14:21:00 +0500
Subject: [PATCH 2/4] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++-
 contrib/postgres_fdw/postgres_fdw.c           | 143 +++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++
 doc/src/sgml/fdwhandler.sgml                  |  75 ++++++
 src/backend/commands/copy.c                   | 225 +++++++++++-------
 src/backend/executor/execPartition.c          |  34 ++-
 src/include/commands/copy.h                   |  11 +
 src/include/foreign/fdwapi.h                  |  15 ++
 10 files changed, 549 insertions(+), 106 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..a37981ff66 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 84bc0ee381..5206814f10 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8084,8 +8084,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8096,6 +8097,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8204,6 +8218,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..9685e731e0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +542,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2050,6 +2062,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index d452d06343..1a56432f0f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2213,6 +2213,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2313,6 +2330,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127212..81728945ea 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,81 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2119db4213..02a034fb37 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -118,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -349,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1114,8 +1113,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1497,6 +1496,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1532,6 +1532,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1858,20 +1863,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1910,8 +1920,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1920,6 +1931,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2006,7 +2022,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2029,7 +2047,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2045,19 +2063,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2144,6 +2165,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2175,24 +2222,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2485,53 +2521,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2895,13 +2942,18 @@ CopyFrom(CopyState cstate)
 	mtstate->resultRelInfo = estate->es_result_relations;
 
 	/*
-	 * Init COPY into foreign table. Initialization of copying into foreign
-	 * partitions will be done later.
+	 * Init COPY into foreign table.
 	 */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-																resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
+
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -3299,10 +3351,16 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
 														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
@@ -3354,7 +3412,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index baaa0f61fa..581498cf6c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -580,8 +580,12 @@ checkMultiInsertMode(const ResultRelInfo *rri, const ResultRelInfo *parent)
 		 */
 		return false;
 
-	if (rri->ri_FdwRoutine != NULL)
-		/* Foreign tables don't support multi-inserts. */
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
@@ -1041,9 +1045,13 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1245,10 +1253,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..e932bdf2f4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.25.1

v9-0003-Add-separated-connections-into-the-postgres_fdw.patchtext/x-patch; charset=UTF-8; name=v9-0003-Add-separated-connections-into-the-postgres_fdw.patchDownload
From 1fc2ed184dd218809dd8bf3d2399bc05eddeb9f2 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Tue, 8 Sep 2020 14:30:03 +0500
Subject: [PATCH 3/4] Add separated connections into the postgres_fdw.

Foreign Copy and someone other may want to use FDW connection
that hasn't shared with anyone else.
---
 contrib/postgres_fdw/connection.c   | 26 +++++++++++++------
 contrib/postgres_fdw/postgres_fdw.c | 39 ++++++++++++++++-------------
 contrib/postgres_fdw/postgres_fdw.h |  3 ++-
 3 files changed, 43 insertions(+), 25 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 08daf26fdf..048c641e85 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -44,7 +44,11 @@
  * ourselves, so that rolling back a subtransaction will kill the right
  * queries and not the wrong ones.
  */
-typedef Oid ConnCacheKey;
+typedef struct ConnCacheKey
+{
+	Oid user;
+	int cid;
+} ConnCacheKey;
 
 typedef struct ConnCacheEntry
 {
@@ -65,6 +69,7 @@ typedef struct ConnCacheEntry
  * Connection cache (initialized on first use)
  */
 static HTAB *ConnectionHash = NULL;
+static int SeparatedConnNum = 0;
 
 /* for assigning cursor numbers and prepared statement numbers */
 static unsigned int cursor_number = 0;
@@ -105,9 +110,9 @@ static bool UserMappingPasswordRequired(UserMapping *user);
  * (not even on error), we need this flag to cue manual cleanup.
  */
 PGconn *
-GetConnection(UserMapping *user, bool will_prep_stmt)
+GetConnection(UserMapping *user, bool will_prep_stmt, bool separate)
 {
-	bool		found;
+	bool		found = false;
 	ConnCacheEntry *entry;
 	ConnCacheKey key;
 
@@ -141,7 +146,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt)
 	xact_got_connection = true;
 
 	/* Create hash key for the entry.  Assume no pad bytes in key struct */
-	key = user->umid;
+	key.user = user->umid;
+	key.cid = separate ? ++SeparatedConnNum : 0;
 
 	/*
 	 * Find or create cached entry for requested connection.
@@ -870,10 +876,16 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 		 */
 		if (PQstatus(entry->conn) != CONNECTION_OK ||
 			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-			entry->changing_xact_state)
+			entry->changing_xact_state || entry->key.cid > 0)
 		{
 			elog(DEBUG3, "discarding connection %p", entry->conn);
 			disconnect_pg_server(entry);
+
+			if (entry->key.cid > 0)
+			{
+				hash_search(ConnectionHash, &entry->key, HASH_REMOVE, NULL);
+				SeparatedConnNum--;
+			}
 		}
 	}
 
@@ -1057,9 +1069,9 @@ pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry)
 
 	/* find server name to be shown in the message below */
 	tup = SearchSysCache1(USERMAPPINGOID,
-						  ObjectIdGetDatum(entry->key));
+						  ObjectIdGetDatum(entry->key.user));
 	if (!HeapTupleIsValid(tup))
-		elog(ERROR, "cache lookup failed for user mapping %u", entry->key);
+		elog(ERROR, "cache lookup failed for user mapping %u", entry->key.user);
 	umform = (Form_pg_user_mapping) GETSTRUCT(tup);
 	server = GetForeignServer(umform->umserver);
 	ReleaseSysCache(tup);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9685e731e0..8bca71e3f5 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -438,7 +438,8 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   char *query,
 											   List *target_attrs,
 											   bool has_returning,
-											   List *retrieved_attrs);
+											   List *retrieved_attrs,
+											   bool separate_conn);
 static TupleTableSlot *execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
@@ -450,7 +451,7 @@ static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 TupleTableSlot *slot);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
-static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void finish_foreign_modify(PgFdwModifyState *fmstate, bool separate_conn);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -1445,7 +1446,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	fsstate->conn = GetConnection(user, false);
+	fsstate->conn = GetConnection(user, false, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
@@ -1840,7 +1841,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									query,
 									target_attrs,
 									has_returning,
-									retrieved_attrs);
+									retrieved_attrs,
+									false);
 
 	resultRelInfo->ri_FdwState = fmstate;
 }
@@ -1916,7 +1918,7 @@ postgresEndForeignModify(EState *estate,
 		return;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, false);
 }
 
 /*
@@ -2022,7 +2024,8 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									sql.data,
 									targetAttrs,
 									retrieved_attrs != NIL,
-									retrieved_attrs);
+									retrieved_attrs,
+									false);
 
 	/*
 	 * If the given resultRelInfo already has PgFdwModifyState set, it means
@@ -2059,7 +2062,7 @@ postgresEndForeignInsert(EState *estate,
 		fmstate = fmstate->aux_fmstate;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, false);
 }
 
 static PgFdwModifyState *copy_fmstate = NULL;
@@ -2103,7 +2106,8 @@ postgresBeginForeignCopy(ModifyTableState *mtstate,
 									sql.data,
 									NIL,
 									false,
-									NIL);
+									NIL,
+									true);
 
 	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
 								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
@@ -2126,7 +2130,7 @@ postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
 	CopyToFinish(fmstate->cstate);
 	pfree(fmstate->cstate);
 	fmstate->cstate = NULL;
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, true);
 }
 
 /*
@@ -2514,7 +2518,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
-	dmstate->conn = GetConnection(user, false);
+	dmstate->conn = GetConnection(user, false, false);
 
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
@@ -2888,7 +2892,7 @@ estimate_path_cost_size(PlannerInfo *root,
 								false, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->user, false);
+		conn = GetConnection(fpinfo->user, false, false);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -3680,7 +3684,8 @@ create_foreign_modify(EState *estate,
 					  char *query,
 					  List *target_attrs,
 					  bool has_returning,
-					  List *retrieved_attrs)
+					  List *retrieved_attrs,
+					  bool separate_conn)
 {
 	PgFdwModifyState *fmstate;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
@@ -3708,7 +3713,7 @@ create_foreign_modify(EState *estate,
 	user = GetUserMapping(userid, table->serverid);
 
 	/* Open connection; report that we'll create a prepared statement. */
-	fmstate->conn = GetConnection(user, true);
+	fmstate->conn = GetConnection(user, true, separate_conn);
 	fmstate->p_name = NULL;		/* prepared statement not made yet */
 
 	/* Set up remote query information. */
@@ -4014,7 +4019,7 @@ store_returning_result(PgFdwModifyState *fmstate,
  *		Release resources for a foreign insert/update/delete operation
  */
 static void
-finish_foreign_modify(PgFdwModifyState *fmstate)
+finish_foreign_modify(PgFdwModifyState *fmstate, bool separate_conn)
 {
 	Assert(fmstate != NULL);
 
@@ -4583,7 +4588,7 @@ postgresAnalyzeForeignTable(Relation relation,
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, false);
 
 	/*
 	 * Construct command to get page count for relation.
@@ -4669,7 +4674,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	table = GetForeignTable(RelationGetRelid(relation));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
-	conn = GetConnection(user, false);
+	conn = GetConnection(user, false, false);
 
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
@@ -4897,7 +4902,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 	 */
 	server = GetForeignServer(serverOid);
 	mapping = GetUserMapping(GetUserId(), server->serverid);
-	conn = GetConnection(mapping, false);
+	conn = GetConnection(mapping, false, false);
 
 	/* Don't attempt to import collation if remote server hasn't got it */
 	if (PQserverVersion(conn) < 90100)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8fc5ff018f..95cf6487a2 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -129,7 +129,8 @@ extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
 /* in connection.c */
-extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);
+extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
+							 bool separate);
 extern void ReleaseConnection(PGconn *conn);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
-- 
2.25.1

v9-0004-Optimized-version-of-the-Fast-COPY-FROM-feature.patchtext/x-patch; charset=UTF-8; name=v9-0004-Optimized-version-of-the-Fast-COPY-FROM-feature.patchDownload
From ff8f0686abb2e37468d6ce71968a51ada9919674 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Thu, 10 Sep 2020 14:37:18 +0500
Subject: [PATCH 4/4] Optimized version of the 'Fast COPY FROM' feature.

Execute remote query 'COPY .. FROM STDIN' once for each foreign
partition (table) in the BeginForeignCopy() routine.

TODO:
1. reporting on errors need to remake. Here is differences from
the way of INSERT query on each row: we can find out error event after
sending END of copy command.
2. It is necessary to examine all possible ways in which an error may
occur during the COPY FROM operation.
---
 contrib/postgres_fdw/connection.c             | 15 ++++
 .../postgres_fdw/expected/postgres_fdw.out    |  4 +-
 contrib/postgres_fdw/postgres_fdw.c           | 81 +++++++++----------
 src/backend/commands/copy.c                   | 51 ++++++------
 src/include/foreign/fdwapi.h                  |  8 +-
 5 files changed, 86 insertions(+), 73 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 048c641e85..8409cea40b 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -827,6 +827,21 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 					/* Assume we might have lost track of prepared statements */
 					entry->have_error = true;
 
+					if (entry->conn && PQstatus(entry->conn) == CONNECTION_OK)
+					{
+						/* Process special case of the unfinished COPY command */
+						res = PQgetResult(entry->conn);
+						if (PQresultStatus(res) == PGRES_COPY_IN &&
+							(PQputCopyEnd(entry->conn,  _("canceled by server")) <= 0 ||
+							PQflush(entry->conn)))
+						{
+							ereport(ERROR,
+									(errmsg("error returned by PQputCopyEnd: %s",
+											PQerrorMessage(entry->conn))));
+						}
+						PQclear(res);
+					}
+
 					/*
 					 * If a command has been submitted to the remote server by
 					 * using an asynchronous execution function, the command
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 5206814f10..ef9b903b58 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8086,7 +8086,7 @@ ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
 remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
-COPY rem2, line 2
+COPY rem2, line 2: ""
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8223,7 +8223,7 @@ alter table loc2 drop column f2;
 copy rem2 from stdin;
 ERROR:  column "f1" of relation "loc2" does not exist
 CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
-COPY rem2, line 3
+COPY rem2, line 0
 alter table loc2 add column f1 int;
 alter table loc2 add column f2 int;
 select * from rem2;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8bca71e3f5..6006c359f9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -359,12 +359,12 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
 static void postgresBeginForeignCopy(ModifyTableState *mtstate,
-									   ResultRelInfo *resultRelInfo);
-static void postgresEndForeignCopy(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+								   ResultRelInfo *resultRelInfo);
 static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
-									  TupleTableSlot **slots,
-									  int nslots);
+									TupleTableSlot **slots,
+									int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -2093,6 +2093,7 @@ postgresBeginForeignCopy(ModifyTableState *mtstate,
 	StringInfoData sql;
 	RangeTblEntry *rte;
 	Relation rel = resultRelInfo->ri_RelationDesc;
+	PGresult *res;
 
 	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
 	initStringInfo(&sql);
@@ -2114,6 +2115,14 @@ postgresBeginForeignCopy(ModifyTableState *mtstate,
 								  NIL, NIL);
 	CopyToStart(fmstate->cstate);
 	resultRelInfo->ri_FdwState = fmstate;
+
+	/*
+	 * Start COPY operation. We may do so because we got a separate connection.
+	 */
+	res = PQexec(fmstate->conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+	PQclear(res);
 }
 
 /*
@@ -2124,6 +2133,28 @@ static void
 postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+
+	/* Finish COPY IN protocol. It is needed to do after successful copy or
+	 * after an error.
+	 */
+	if (PQputCopyEnd(conn, NULL) <= 0 || PQflush(conn))
+		ereport(ERROR,
+				(errmsg("error returned by PQputCopyEnd: %s",
+						PQerrorMessage(conn))));
+
+	/* After successfully  sending an EOF signal, check command status. */
+	res = PQgetResult(conn);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+	PQclear(res);
+	/* Do this to ensure we've pumped libpq back to idle state */
+	if (PQgetResult(conn) != NULL)
+		ereport(ERROR,
+				(errmsg("unexpected extra results during COPY of table: %s",
+						PQerrorMessage(conn))));
 
 	/* Check correct use of CopyIn FDW API. */
 	Assert(fmstate->cstate != NULL);
@@ -2143,58 +2174,26 @@ postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
 						  TupleTableSlot **slots, int nslots)
 {
 	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
-	PGresult *res;
-	PGconn *conn = fmstate->conn;
-	bool status = false;
 	int i;
 
-	/* Check correct use of CopyIn FDW API. */
+	/* Check correct use of Copy FDW API. */
 	Assert(fmstate->cstate != NULL);
 	Assert(copy_fmstate == NULL);
 
-	res = PQexec(conn, fmstate->query);
-	if (PQresultStatus(res) != PGRES_COPY_IN)
-		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
-	PQclear(res);
-
 	PG_TRY();
 	{
 		copy_fmstate = fmstate;
 		for (i = 0; i < nslots; i++)
 			CopyOneRowTo(fmstate->cstate, slots[i]);
-
-		status = true;
 	}
-	PG_FINALLY();
+	PG_CATCH();
 	{
 		copy_fmstate = NULL; /* Detect problems */
-
-		/* Finish COPY IN protocol. It is needed to do after successful copy or
-		 * after an error.
-		 */
-		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
-			PQflush(conn))
-			ereport(ERROR,
-					(errmsg("error returned by PQputCopyEnd: %s",
-							PQerrorMessage(conn))));
-
-		/* After successfully  sending an EOF signal, check command status. */
-		res = PQgetResult(conn);
-		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
-			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
-
-		PQclear(res);
-		/* Do this to ensure we've pumped libpq back to idle state */
-		if (PQgetResult(conn) != NULL)
-			ereport(ERROR,
-					(errmsg("unexpected extra results during COPY of table: %s",
-							PQerrorMessage(conn))));
-
-		if (!status)
-			PG_RE_THROW();
+		PG_RE_THROW();
 	}
 	PG_END_TRY();
+
+	copy_fmstate = NULL;
 }
 
 /*
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 02a034fb37..02487c9742 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2941,20 +2941,6 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
-	/*
-	 * Init COPY into foreign table.
-	 */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL)
-	{
-		if (target_resultRelInfo->ri_usesMultiInsert)
-			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
-																  resultRelInfo);
-		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-																	resultRelInfo);
-	}
-
-
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -3021,6 +3007,19 @@ CopyFrom(CopyState cstate)
 	errcallback.previous = error_context_stack;
 	error_context_stack = &errcallback;
 
+	/*
+	 * Init COPY into foreign table.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
+
 	for (;;)
 	{
 		TupleTableSlot *myslot;
@@ -3327,6 +3326,18 @@ CopyFrom(CopyState cstate)
 	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
+	/* Allow the FDW to shut down */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
+
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
@@ -3350,18 +3361,6 @@ CopyFrom(CopyState cstate)
 
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
-	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL)
-	{
-		if (target_resultRelInfo->ri_usesMultiInsert &&
-			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
-			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
-														target_resultRelInfo);
-		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-														target_resultRelInfo);
-	}
-
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index e932bdf2f4..d807f872ba 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -105,14 +105,14 @@ typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
 typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
-											 ResultRelInfo *rinfo);
+										   ResultRelInfo *rinfo);
 
 typedef void (*EndForeignCopy_function) (EState *estate,
-										   ResultRelInfo *rinfo);
+										 ResultRelInfo *rinfo);
 
 typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
-													   TupleTableSlot **slots,
-													   int nslots);
+										  TupleTableSlot **slots,
+										  int nslots);
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
-- 
2.25.1

#36Amit Langote
amitlangote09@gmail.com
In reply to: Andrey V. Lepikhov (#35)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Thu, Sep 10, 2020 at 6:57 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 9/9/20 5:51 PM, Amit Langote wrote:

On Wed, Sep 9, 2020 at 6:42 PM Alexey Kondratov <a.kondratov@postgrespro.ru> wrote:

And InitResultRelInfo() may set ri_usesMultiInsert to false by default,
since it's used only by COPY now. Then you won't need this in several
places:

+ resultRelInfo->ri_usesMultiInsert = false;

While the logic of turning multi-insert on with all the validations
required could be factored out of InitResultRelInfo() to a separate
routine.

Interesting idea. Maybe better to have a separate routine like Alexey says.

Ok. I rewrited the patch 0001 with the Alexey suggestion.

Thank you. Some mostly cosmetic suggestions on that:

+bool
+checkMultiInsertMode(const ResultRelInfo *rri, const ResultRelInfo *parent)

I think we should put this definition in executor.c and export in
executor.h, not execPartition.c/h. Also, better to match the naming
style of surrounding executor routines, say,
ExecRelationAllowsMultiInsert? I'm not sure if we need the 'parent'
parameter but as it's pretty specific to partition's case, maybe
partition_root is a better name.

+   if (!checkMultiInsertMode(target_resultRelInfo, NULL))
+   {
+       /*
+        * Do nothing. Can't allow multi-insert mode if previous conditions
+        * checking disallow this.
+        */
+   }

Personally, I find this notation with empty blocks a bit strange.
Maybe it's easier to read this instead:

if (!cstate->volatile_defexprs &&
!contain_volatile_functions(cstate->whereClause) &&
ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
target_resultRelInfo->ri_usesMultiInsert = true;

Also, I don't really understand why we need
list_length(cstate->attnumlist) > 0 to use multi-insert on foreign
tables but apparently we do. The next patch should add that condition
here along with a brief note on that in the comment.

-   if (resultRelInfo->ri_FdwRoutine != NULL &&
-       resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-       resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-                                                        resultRelInfo);
+   /*
+    * Init COPY into foreign table. Initialization of copying into foreign
+    * partitions will be done later.
+    */
+   if (target_resultRelInfo->ri_FdwRoutine != NULL &&
+       target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+       target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+                                                               resultRelInfo);
@@ -3349,11 +3302,10 @@ CopyFrom(CopyState cstate)
    if (target_resultRelInfo->ri_FdwRoutine != NULL &&
        target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
        target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-
target_resultRelInfo);
+                                                       target_resultRelInfo);

These two hunks seem unnecessary, which I think I introduced into this
patch when breaking it out of the main one.

Please check the attached delta patch which contains the above changes.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-delta.patchapplication/octet-stream; name=0001-delta.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2119db4..41a7067 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2826,52 +2826,24 @@ CopyFrom(CopyState cstate)
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
 	 * insertion, and insert them in bulk, for example, with one
-	 * table_multi_insert() call than call table_tuple_insert() separately
-	 * for every tuple. However, there are a number of reasons why we might
-	 * not be able to do this.  We check some conditions below while some
-	 * other target relation properties are checked in InitResultRelInfo().
-	 * Partition initialization will use result of this check implicitly as
-	 * the ri_usesMultiInsert value of the parent relation.
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecRelationAllowsMultiInsert().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
 	 */
-	if (!checkMultiInsertMode(target_resultRelInfo, NULL))
-	{
-		/*
-		 * Do nothing. Can't allow multi-insert mode if previous conditions
-		 * checking disallow this.
-		 */
-	}
-	else if (cstate->volatile_defexprs || list_length(cstate->attnumlist) == 0)
-	{
-		/*
-		 * Can't support bufferization of copy into foreign tables without any
-		 * defined columns or if there are any volatile default expressions in the
-		 * table. Similarly to the trigger case above, such expressions may query
-		 * the table we're inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-	}
-	else
-	{
-		/*
-		 * Looks okay to try multi-insert.
-		 *
-		 * For partitioned tables, whether or not to use multi-insert depends
-		 * on the individual parition's properties which are also checked in
-		 * InitResultRelInfo().
-		 */
+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause) &&
+		ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
 		target_resultRelInfo->ri_usesMultiInsert = true;
-	}
 
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
@@ -2898,10 +2870,10 @@ CopyFrom(CopyState cstate)
 	 * Init COPY into foreign table. Initialization of copying into foreign
 	 * partitions will be done later.
 	 */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-																resultRelInfo);
+	if (resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+														 resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -3302,7 +3274,7 @@ CopyFrom(CopyState cstate)
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-														target_resultRelInfo);
+															  target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 12ee7f2..4999474 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1351,6 +1351,55 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 }
 
 /*
+ * ExecRelationAllowsMultiInsert
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root)
+{
+	Assert(rri->ri_usesMultiInsert == false);
+
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (partition_root && !partition_root->ri_usesMultiInsert)
+		return false;
+
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	/* Foreign tables don't support multi-inserts. */
+	if (rri->ri_FdwRoutine != NULL)
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index baaa0f6..d752fe3 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -548,46 +548,6 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 	}
 }
 
-bool
-checkMultiInsertMode(const ResultRelInfo *rri, const ResultRelInfo *parent)
-{
-	Assert(rri->ri_usesMultiInsert == false);
-
-	if (parent && !parent->ri_usesMultiInsert)
-		return false;
-
-	/* Check if the relation allows to use "multi-insert" mode. */
-	if (rri->ri_TrigDesc != NULL &&
-		(rri->ri_TrigDesc->trig_insert_before_row ||
-		 rri->ri_TrigDesc->trig_insert_instead_row))
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		return false;
-
-	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
-		rri->ri_TrigDesc != NULL &&
-		rri->ri_TrigDesc->trig_insert_new_table)
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		return false;
-
-	if (rri->ri_FdwRoutine != NULL)
-		/* Foreign tables don't support multi-inserts. */
-		return false;
-
-	/* OK, caller can use multi-insert on this relation. */
-	return true;
-}
-
 /*
  * ExecInitPartitionInfo
  *		Lock the partition and initialize ResultRelInfo.  Also setup other
@@ -628,7 +588,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	 * parent and its child.
 	 */
 	leaf_part_rri->ri_usesMultiInsert =
-						checkMultiInsertMode(leaf_part_rri, rootResultRelInfo);
+		ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);
 
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117..1146685 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -190,6 +190,8 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  Relation partition_root,
 							  int instrument_options);
+extern bool ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
#37Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#36)
2 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

16.09.2020 12:10, Amit Langote пишет:

On Thu, Sep 10, 2020 at 6:57 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 9/9/20 5:51 PM, Amit Langote wrote:
Ok. I rewrited the patch 0001 with the Alexey suggestion.

Thank you. Some mostly cosmetic suggestions on that:

+bool
+checkMultiInsertMode(const ResultRelInfo *rri, const ResultRelInfo *parent)

I think we should put this definition in executor.c and export in
executor.h, not execPartition.c/h. Also, better to match the naming
style of surrounding executor routines, say,
ExecRelationAllowsMultiInsert? I'm not sure if we need the 'parent'
parameter but as it's pretty specific to partition's case, maybe
partition_root is a better name.

Agreed

+   if (!checkMultiInsertMode(target_resultRelInfo, NULL))
+   {
+       /*
+        * Do nothing. Can't allow multi-insert mode if previous conditions
+        * checking disallow this.
+        */
+   }

Personally, I find this notation with empty blocks a bit strange.
Maybe it's easier to read this instead:

if (!cstate->volatile_defexprs &&
!contain_volatile_functions(cstate->whereClause) &&
ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
target_resultRelInfo->ri_usesMultiInsert = true;

Agreed

Also, I don't really understand why we need
list_length(cstate->attnumlist) > 0 to use multi-insert on foreign
tables but apparently we do. The next patch should add that condition
here along with a brief note on that in the comment.

This is a feature of the COPY command. It can't be used without any
column in braces. However, foreign tables without columns can exist.
You can see this problem if you apply the 0002 patch on top of your
delta patch. Ashutosh in [1]/messages/by-id/CAExHW5uAtyAVL-iuu1Hsd0fycqS5UHoHCLfauYDLQwRucwC9Og@mail.gmail.com noticed this problem and anchored it with
regression test.
I included this expression (with comments) into the 0002 patch.

-   if (resultRelInfo->ri_FdwRoutine != NULL &&
-       resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-       resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-                                                        resultRelInfo);
+   /*
+    * Init COPY into foreign table. Initialization of copying into foreign
+    * partitions will be done later.
+    */
+   if (target_resultRelInfo->ri_FdwRoutine != NULL &&
+       target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+       target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+                                                               resultRelInfo);
@@ -3349,11 +3302,10 @@ CopyFrom(CopyState cstate)
if (target_resultRelInfo->ri_FdwRoutine != NULL &&
target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-
target_resultRelInfo);
+                                                       target_resultRelInfo);

These two hunks seem unnecessary, which I think I introduced into this
patch when breaking it out of the main one.

Please check the attached delta patch which contains the above changes.

I applied your delta patch to the 0001 patch and fix the 0002 patch in
accordance with these changes.
Patches 0003 and 0004 are experimental and i will not support them
before discussing on applicability.

[1]: /messages/by-id/CAExHW5uAtyAVL-iuu1Hsd0fycqS5UHoHCLfauYDLQwRucwC9Og@mail.gmail.com
/messages/by-id/CAExHW5uAtyAVL-iuu1Hsd0fycqS5UHoHCLfauYDLQwRucwC9Og@mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v10-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v10-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 05e0e9cf9de7a3893dd1692d8ea0131cc1332433 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Sun, 20 Sep 2020 11:44:30 +0300
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++-
 contrib/postgres_fdw/postgres_fdw.c           | 143 +++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++
 doc/src/sgml/fdwhandler.sgml                  |  75 ++++++
 src/backend/commands/copy.c                   | 228 +++++++++++-------
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  26 +-
 src/include/commands/copy.h                   |  11 +
 src/include/foreign/fdwapi.h                  |  15 ++
 11 files changed, 552 insertions(+), 106 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df19fe..fa7740163d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 10e23d02ed..26d989591d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8076,8 +8076,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8088,6 +8089,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8196,6 +8210,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..9685e731e0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -533,6 +542,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2050,6 +2062,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool status = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));
+
+		/* After successfully  sending an EOF signal, check command status. */
+		res = PQgetResult(conn);
+		if ((!status && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(status && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!status)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 78156d10b4..45e5e2042c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2212,6 +2212,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2312,6 +2329,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127212..81728945ea 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,81 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5a36a86c60..4deee7ffc3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -118,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -349,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1114,8 +1113,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1497,6 +1496,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1532,6 +1532,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1858,20 +1863,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1910,8 +1920,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1920,6 +1931,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2006,7 +2022,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2029,7 +2047,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2045,19 +2063,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2144,6 +2165,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2175,24 +2222,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2485,53 +2521,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+										  NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2839,8 +2886,11 @@ CopyFrom(CopyState cstate)
 	 * checked by calling ExecRelationAllowsMultiInsert().  It does not matter
 	 * whether partitions have any volatile default expressions as we use the
 	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
 	 */
 	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
 		!contain_volatile_functions(cstate->whereClause) &&
 		ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
 		target_resultRelInfo->ri_usesMultiInsert = true;
@@ -2868,12 +2918,17 @@ CopyFrom(CopyState cstate)
 
 	/*
 	 * Init COPY into foreign table. Initialization of copying into foreign
-	 * partitions will be done later.
+-	 * partitions will be done later.
 	 */
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -3271,10 +3326,16 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+ 														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
@@ -3326,7 +3387,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 97a483b179..1397e77197 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1372,8 +1372,12 @@ ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
 		rri->ri_TrigDesc->trig_insert_new_table)
 		return false;
 
-	/* Foreign tables don't support multi-inserts. */
-	if (rri->ri_FdwRoutine != NULL)
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 121484374f..c56b0000b8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1001,9 +1001,13 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1205,10 +1209,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..e932bdf2f4 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.17.1

v10-0001-Move-multi-insert-decision-logic-into-executor.patchtext/x-patch; charset=UTF-8; name=v10-0001-Move-multi-insert-decision-logic-into-executor.patchDownload
From 627f5566d2f6fcd93c1f6ce3fa9b37986b3bc7c3 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Sun, 20 Sep 2020 10:37:24 +0300
Subject: [PATCH 1/2] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
proprties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copy.c          | 152 +++++++--------------------
 src/backend/executor/execMain.c      |  52 +++++++++
 src/backend/executor/execPartition.c |   7 ++
 src/include/executor/execPartition.h |   2 +
 src/include/executor/executor.h      |   2 +
 src/include/nodes/execnodes.h        |   9 +-
 6 files changed, 109 insertions(+), 115 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2047557e52..5a36a86c60 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -2715,12 +2705,10 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 
@@ -2833,6 +2821,30 @@ CopyFrom(CopyState cstate)
 					  0);
 	target_resultRelInfo = resultRelInfo;
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecRelationAllowsMultiInsert().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 */
+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause) &&
+		ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
+		target_resultRelInfo->ri_usesMultiInsert = true;
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -2854,6 +2866,10 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = estate->es_result_relations;
 
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
@@ -2886,83 +2902,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -2970,7 +2912,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3013,7 +2955,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3021,7 +2963,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3080,24 +3021,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3149,7 +3080,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3168,9 +3099,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3241,7 +3169,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3316,11 +3244,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3352,8 +3277,7 @@ CopyFrom(CopyState cstate)
 															  target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	ExecCloseIndices(target_resultRelInfo);
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2e27e26ba4..97a483b179 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1326,6 +1326,58 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/* Define multi-insert mode possibility later if needed */
+	resultRelInfo->ri_usesMultiInsert = false;
+}
+
+/*
+ * ExecRelationAllowsMultiInsert
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root)
+{
+	Assert(rri->ri_usesMultiInsert == false);
+
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (partition_root && !partition_root->ri_usesMultiInsert)
+		return false;
+
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	/* Foreign tables don't support multi-inserts. */
+	if (rri->ri_FdwRoutine != NULL)
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33d2c6f63d..121484374f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -583,6 +583,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	/*
+	 * Use multi-insert mode if the condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert =
+		ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 6d1b722198..895bcd01c6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -145,6 +145,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..11466854d1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -190,6 +190,8 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  Relation partition_root,
 							  int instrument_options);
+extern bool ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a5ab1aed14..cbd77d56af 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -486,7 +486,14 @@ typedef struct ResultRelInfo
 	/* info for partition tuple routing (NULL if not set up yet) */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* for use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.17.1

#38Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#36)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

This patch currently looks very ready for use. And I'm taking a close
look at the error reporting. Here we have difference in behavior of
local and foreign table:

regression test in postgres_fdw.sql:
copy rem2 from stdin;
-1 xyzzy
\.

reports error (1):
=================
ERROR: new row for relation "loc2" violates check constraint...
DETAIL: Failing row contains (-1, xyzzy).
CONTEXT: COPY loc2, line 1: "-1 xyzzy"
remote SQL command: COPY public.loc2(f1, f2) FROM STDIN
COPY rem2, line 2

But local COPY into loc2 reports another error (2):
===================================================
copy loc2 from stdin;
ERROR: new row for relation "loc2" violates check constraint...
DETAIL: Failing row contains (-1, xyzzy).
CONTEXT: COPY loc2, line 1: "-1 xyzzy"

Report (2) is shorter and more specific.
Report (1) contains meaningless information.

Maybe we need to improve error report? For example like this:
ERROR: Failed COPY into foreign table "rem2":
new row for relation "loc2" violates check constraint...
DETAIL: Failing row contains (-1, xyzzy).
remote SQL command: COPY public.loc2(f1, f2) FROM STDIN
COPY rem2, line 1

The problem here is that we run into an error after the COPY FROM
command completes. And we need to translate lineno from foreign server
to lineno of overall COPY command.

--
regards,
Andrey Lepikhov
Postgres Professional

#39tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey Lepikhov (#38)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hello Andrey-san,

Thank you for challenging an interesting feature. Below are my review comments.

(1)
-	/* for use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;

It's better to place the new bool member next to an existing bool member, so that the structure doesn't get larger.

(2)
+ Assert(rri->ri_usesMultiInsert == false);

As the above assertion represents, I'm afraid the semantics of ExecRelationAllowsMultiInsert() and ResultRelInfo->ri_usesMultiInsert are unclear. In CopyFrom(), ri_usesMultiInsert is set by also considering the COPY-specific conditions:

+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause) &&
+		ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
+		target_resultRelInfo->ri_usesMultiInsert = true;

On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set purely based on the relation's characteristics.

+	leaf_part_rri->ri_usesMultiInsert =
+		ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);

In addition to these differences, I think it's a bit confusing that the function itself doesn't record the check result in ri_usesMultiInsert.

It's probably easy to understand to not add ri_usesMultiInsert, and the function just encapsulates the check logic based solely on the relation characteristics and returns the result. So, the argument is just one ResultRelInfo. The caller (e.g. COPY) combines the function result with other specific conditions.

(3)
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);

To align with other function groups, it's better to place the functions in order of Begin, Exec, and End.

(4)
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;

To align with the other functions' comment, the comment should be:
/* Support functions for COPY */

(5)
+<programlisting>
+TupleTableSlot *
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.

The return type is void.

(6)
+ <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>

cis -> is

(7)
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>

"attempts to insert into" should be "attempts to run COPY on", because it's used for COPY.
Furthermore, if ExecForeignCopy is NULL, COPY should use ExecForeignInsert() instead, right? Otherwise, existing FDWs would become unable to be used for COPY.

(8)
+ bool pipe = (filename == NULL) && (data_dest_cb == NULL);

The above pipe in BeginCopyTo() is changed to not match pipe in DoCopyTo(), which only refers to filename. Should pipe in DoCopyTo() also be changed? If no, the use of the same variable name for different conditions is confusing.

(9)
-	 * partitions will be done later.
+-	 * partitions will be done later.

This is an unintended addition of '-'?

(10)
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}

BeginForeignCopy() should be called if it's defined, because BeginForeignCopy() is an optional function.

(11) 
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,

The extra empty line seems unintended.

(12)
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);

As in the COPY_FILENAME case, shouldn't the line terminator be sent only in text format, and be changed to \r\n on Windows? I'm asking this as I'm probably a bit confused about in what situation COPY_CALLBACK could be used. I thought the binary format and \r\n line terminator could be necessary depending on the FDW implementation.

(13)
@@ -1001,9 +1001,13 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}

BeginForeignCopy() should be called only if it's defined, because BeginForeignCopy() is an optional function.

(14)
@@ -1205,10 +1209,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
ResultRelInfo *resultRelInfo = proute->partitions[i];

 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}

EndForeignCopy() is an optional function, isn't it? That is, it's called if it's defined.

(15)
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}

The following page says "Use PQerrorMessage to retrieve details if the return value is -1." So, it's correct to not use PGresult here and pass NULL as the second argument to pgfdw_report_error().

https://www.postgresql.org/docs/devel/libpq-copy.html

(16)
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}

I'm afraid it's not intuitive what "status is true" means. I think copy_data_sent or copy_send_success would be better for the variable name.

(17)
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));

As the places that call PQsendQuery(), it seems preferrable to call pgfdw_report_error() here too.

Regards
Takayuki Tsunakawa

#40Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#39)
2 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

19.10.2020 09:12, tsunakawa.takay@fujitsu.com пишет:

Hello Andrey-san,

Thank you for challenging an interesting feature. Below are my review comments.

(1)
-	/* for use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
} ResultRelInfo;

It's better to place the new bool member next to an existing bool member, so that the structure doesn't get larger.

Here the variable position chosen in accordance with the logical
meaning. I don't see large problem with size of this structure.

(2)
+ Assert(rri->ri_usesMultiInsert == false);

As the above assertion represents, I'm afraid the semantics of ExecRelationAllowsMultiInsert() and ResultRelInfo->ri_usesMultiInsert are unclear. In CopyFrom(), ri_usesMultiInsert is set by also considering the COPY-specific conditions:

+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause) &&
+		ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
+		target_resultRelInfo->ri_usesMultiInsert = true;

On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set purely based on the relation's characteristics.

+	leaf_part_rri->ri_usesMultiInsert =
+		ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);

In addition to these differences, I think it's a bit confusing that the function itself doesn't record the check result in ri_usesMultiInsert.

It's probably easy to understand to not add ri_usesMultiInsert, and the function just encapsulates the check logic based solely on the relation characteristics and returns the result. So, the argument is just one ResultRelInfo. The caller (e.g. COPY) combines the function result with other specific conditions.

I can't fully agreed with this suggestion. We do so because in the
future anyone can call this code from another subsystem for another
purposes. And we want all the relation-related restrictions contains in
one routine. CopyState-related restrictions used in copy.c only and
taken out of this function.

(3)
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+											 ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   int nslots);

To align with other function groups, it's better to place the functions in order of Begin, Exec, and End.

Ok, thanks.

(4)
+	/* COPY a bulk of tuples into a foreign relation */
+	BeginForeignCopy_function BeginForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;

To align with the other functions' comment, the comment should be:
/* Support functions for COPY */

Agreed

(5)
+<programlisting>
+TupleTableSlot *
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.

The return type is void.

Agreed

(6)
+ <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>

cis -> is

Ok

(7)
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>

"attempts to insert into" should be "attempts to run COPY on", because it's used for COPY.
Furthermore, if ExecForeignCopy is NULL, COPY should use ExecForeignInsert() instead, right? Otherwise, existing FDWs would become unable to be used for COPY.

Thanks

(8)
+ bool pipe = (filename == NULL) && (data_dest_cb == NULL);

The above pipe in BeginCopyTo() is changed to not match pipe in DoCopyTo(), which only refers to filename. Should pipe in DoCopyTo() also be changed? If no, the use of the same variable name for different conditions is confusing.

Ok

(9)
-	 * partitions will be done later.
+-	 * partitions will be done later.

This is an unintended addition of '-'?

Ok

(10)
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}

BeginForeignCopy() should be called if it's defined, because BeginForeignCopy() is an optional function.

Maybe

(11)
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,

The extra empty line seems unintended.

Ok

(12)
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
break;
case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			CopySendChar(cstate, '\n');
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);

As in the COPY_FILENAME case, shouldn't the line terminator be sent only in text format, and be changed to \r\n on Windows? I'm asking this as I'm probably a bit confused about in what situation COPY_CALLBACK could be used. I thought the binary format and \r\n line terminator could be necessary depending on the FDW implementation.

Ok. I don't want to allow binary format in callback mode right now. It
is not a subject of this patch. Maybe it will be done later.

(13)
@@ -1001,9 +1001,13 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
* If the partition is a foreign table, let the FDW init itself for
* routing tuples to the partition.
*/
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}

BeginForeignCopy() should be called only if it's defined, because BeginForeignCopy() is an optional function.

Ok

(14)
@@ -1205,10 +1209,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
ResultRelInfo *resultRelInfo = proute->partitions[i];

/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}

EndForeignCopy() is an optional function, isn't it? That is, it's called if it's defined.

ri_usesMultiInsert must guarantee that we will use multi-insertions. And
we use only assertions to control this.

(15)
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+	{
+		PGresult *res = PQgetResult(conn);
+
+		pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+	}
+}

The following page says "Use PQerrorMessage to retrieve details if the return value is -1." So, it's correct to not use PGresult here and pass NULL as the second argument to pgfdw_report_error().

https://www.postgresql.org/docs/devel/libpq-copy.html

Ok

(16)
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		status = true;
+	}

I'm afraid it's not intuitive what "status is true" means. I think copy_data_sent or copy_send_success would be better for the variable name.

Agreed. renamed to 'OK'. In accordance with psql/copy.c.

(17)
+		if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			ereport(ERROR,
+					(errmsg("error returned by PQputCopyEnd: %s",
+							PQerrorMessage(conn))));

As the places that call PQsendQuery(), it seems preferrable to call pgfdw_report_error() here too.

Agreed

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v11-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v11-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 1d7e4fb1f7990a5d5db53e8ddf6809f048c17f34 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 19 Oct 2020 16:24:55 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++-
 contrib/postgres_fdw/postgres_fdw.c           | 137 ++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++
 doc/src/sgml/fdwhandler.sgml                  |  73 ++++++
 src/backend/commands/copy.c                   | 236 +++++++++++-------
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  29 ++-
 src/include/commands/copy.h                   |  11 +
 src/include/foreign/fdwapi.h                  |  15 ++
 11 files changed, 554 insertions(+), 107 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df19fe..fa7740163d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..be8db5ac63 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8076,8 +8076,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8088,6 +8089,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8196,6 +8210,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..1657a20d9b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -534,6 +543,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2051,6 +2063,131 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!OK)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..22dcd12f02 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2212,6 +2212,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2312,6 +2329,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..a9a7402440 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,79 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 83ce196a45..26d79ad051 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -118,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -349,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -585,7 +583,13 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			Assert(!cstate->binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1114,8 +1118,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1501,6 +1505,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1536,6 +1541,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1862,20 +1872,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1914,8 +1929,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1924,6 +1940,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2001,7 +2022,7 @@ BeginCopyTo(ParseState *pstate,
 static uint64
 DoCopyTo(CopyState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -2010,7 +2031,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2033,7 +2056,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2049,19 +2072,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2148,6 +2174,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2179,24 +2231,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2486,54 +2527,63 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo, buffer->slots[i],
+										  estate, false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2838,8 +2888,11 @@ CopyFrom(CopyState cstate)
 	 * checked by calling ExecRelationAllowsMultiInsert().  It does not matter
 	 * whether partitions have any volatile default expressions as we use the
 	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
 	 */
 	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
 		!contain_volatile_functions(cstate->whereClause))
 		target_resultRelInfo->ri_usesMultiInsert =
 					ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL);
@@ -2863,10 +2916,18 @@ CopyFrom(CopyState cstate)
 	 * Init COPY into foreign table. Initialization of copying into foreign
 	 * partitions will be done later.
 	 */
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -3261,10 +3322,16 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
@@ -3315,7 +3382,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0ad98ff0e7..9d465cebaf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1291,8 +1291,12 @@ ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
 		rri->ri_TrigDesc->trig_insert_new_table)
 		return false;
 
-	/* Foreign tables don't support multi-inserts. */
-	if (rri->ri_FdwRoutine != NULL)
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 121484374f..fae21356c7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1001,9 +1001,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
@@ -1205,10 +1212,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..52b213f5aa 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.17.1

v11-0001-Move-multi-insert-decision-logic-into-executor.patchtext/x-patch; charset=UTF-8; name=v11-0001-Move-multi-insert-decision-logic-into-executor.patchDownload
From fd074709c606671c130b1951362b9ddfca74543c Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 19 Oct 2020 14:02:51 +0500
Subject: [PATCH 1/2] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
proprties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copy.c          | 152 +++++++--------------------
 src/backend/executor/execMain.c      |  52 +++++++++
 src/backend/executor/execPartition.c |   7 ++
 src/include/executor/execPartition.h |   2 +
 src/include/executor/executor.h      |   2 +
 src/include/nodes/execnodes.h        |   9 +-
 6 files changed, 109 insertions(+), 115 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 531bd7c73a..83ce196a45 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -2717,12 +2707,10 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -2832,6 +2820,30 @@ CopyFrom(CopyState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecRelationAllowsMultiInsert().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 */
+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -2847,6 +2859,10 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = resultRelInfo;
 
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
@@ -2879,83 +2895,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -2963,7 +2905,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3006,7 +2948,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3014,7 +2956,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3073,24 +3014,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3137,7 +3068,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3156,9 +3087,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3230,7 +3158,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3306,11 +3234,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3342,8 +3267,7 @@ CopyFrom(CopyState cstate)
 															  target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 293f53d07c..0ad98ff0e7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1245,6 +1245,58 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionRoot = partition_root;
 	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/* Define multi-insert mode possibility later if needed */
+	resultRelInfo->ri_usesMultiInsert = false;
+}
+
+/*
+ * ExecRelationAllowsMultiInsert
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root)
+{
+	Assert(rri->ri_usesMultiInsert == false);
+
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (partition_root && !partition_root->ri_usesMultiInsert)
+		return false;
+
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	/* Foreign tables don't support multi-inserts. */
+	if (rri->ri_FdwRoutine != NULL)
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33d2c6f63d..121484374f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -583,6 +583,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	/*
+	 * Use multi-insert mode if the condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert =
+		ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 6d1b722198..895bcd01c6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -145,6 +145,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index b7978cd22e..849f207691 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -190,6 +190,8 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  Relation partition_root,
 							  int instrument_options);
+extern bool ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+										  const ResultRelInfo *partition_root);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b7e9e5d539..b1d22ae7ea 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -486,7 +486,14 @@ typedef struct ResultRelInfo
 	/* info for partition tuple routing (NULL if not set up yet) */
 	struct PartitionRoutingInfo *ri_PartitionInfo;
 
-	/* for use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.17.1

#41tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey Lepikhov (#40)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey-san,

Thanks for the revision. The patch looks good except for the following two items.

(18)
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}

(14)
@@ -1205,10 +1209,18 @@ ExecCleanupTupleRouting(ModifyTableState

*mtstate,

ResultRelInfo *resultRelInfo = proute->partitions[i];

/* Allow any FDWs to shut down */
- if (resultRelInfo->ri_FdwRoutine != NULL &&
- resultRelInfo->ri_FdwRoutine->EndForeignInsert !=

NULL)

-

resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,

-

resultRelInfo);

+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+

Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);

+

resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,

+

resultRelInfo);

+ }
+ else if

(resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)

+

resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,

+

resultRelInfo);

+ }

EndForeignCopy() is an optional function, isn't it? That is, it's called if it's

defined.

ri_usesMultiInsert must guarantee that we will use multi-insertions. And we
use only assertions to control this.

The code appears to require both BeginForeignCopy and EndForeignCopy, while the following documentation says they are optional. Which is correct? (I suppose the latter is correct just like other existing Begin/End functions are optional.)

+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.

(2)
+ Assert(rri->ri_usesMultiInsert == false);

As the above assertion represents, I'm afraid the semantics of

ExecRelationAllowsMultiInsert() and ResultRelInfo->ri_usesMultiInsert are
unclear. In CopyFrom(), ri_usesMultiInsert is set by also considering the
COPY-specific conditions:

+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause) &&
+		ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
+		target_resultRelInfo->ri_usesMultiInsert = true;

On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set

purely based on the relation's characteristics.

+	leaf_part_rri->ri_usesMultiInsert =
+		ExecRelationAllowsMultiInsert(leaf_part_rri,

rootResultRelInfo);

In addition to these differences, I think it's a bit confusing that the function

itself doesn't record the check result in ri_usesMultiInsert.

It's probably easy to understand to not add ri_usesMultiInsert, and the

function just encapsulates the check logic based solely on the relation
characteristics and returns the result. So, the argument is just one
ResultRelInfo. The caller (e.g. COPY) combines the function result with other
specific conditions.

I can't fully agreed with this suggestion. We do so because in the future anyone
can call this code from another subsystem for another purposes. And we want
all the relation-related restrictions contains in one routine. CopyState-related
restrictions used in copy.c only and taken out of this function.

I'm sorry if I'm misinterpreting you, but I think the following simply serves its role sufficiently and cleanly without using ri_usesMultiInsert.

bool
ExecRelationAllowsMultiInsert(RelationRelInfo *rri)
{
check if the relation allows multiinsert based on its characteristics;
return true or false;
}

I'm concerned that if one subsystem sets ri_usesMultiInsert to true based on its additional specific conditions, it might lead to another subsystem's misjudgment. For example, when subsystem A and B want to do different things respectively:

[Subsystem A]
if (ExecRelationAllowsMultiInsert(rri) && {A's conditions})
rri->ri_usesMultiInsert = true;
...
if (rri->ri_usesMultiInsert)
do A's business;

[Subsystem B]
if (rri->ri_usesMultiInsert)
do B's business;

Here, what if subsystem A and B don't want each other's specific conditions to hold true? That is, A wants to do A's business only if B's specific conditions don't hold true. If A sets rri->ri_usesMultiInsert to true and passes rri to B, then B wrongly does B's business despite that A's specific conditions are true.

(I think this is due to some form of violation of encapsulation.)

Regards
Takayuki Tsunakawa

#42Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#41)
2 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi,

I needed to look at this patch while working on something related, and I
found it got broken by 6973533650c a couple days ago. So here's a fixed
version, to keep cfbot happy. I haven't done any serious review yet.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v12-0001-Move-multi-insert-decision-logic-into-execu-20201110.patchtext/x-patch; charset=UTF-8; name=v12-0001-Move-multi-insert-decision-logic-into-execu-20201110.patchDownload
From 75ad201a09238c8fb69a22a85b2cde72838c2c84 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 10 Nov 2020 18:54:05 +0100
Subject: [PATCH 1/2] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
proprties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copy.c          | 152 +++++++--------------------
 src/backend/executor/execMain.c      |  52 +++++++++
 src/backend/executor/execPartition.c |   7 ++
 src/include/executor/execPartition.h |   2 +
 src/include/executor/executor.h      |   2 +
 src/include/nodes/execnodes.h        |   9 +-
 6 files changed, 109 insertions(+), 115 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 115860a9d4..d882396d6f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -85,16 +85,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -2717,12 +2707,10 @@ CopyFrom(CopyState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -2832,6 +2820,30 @@ CopyFrom(CopyState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecRelationAllowsMultiInsert().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 */
+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -2847,6 +2859,10 @@ CopyFrom(CopyState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = resultRelInfo;
 
+	/*
+	 * Init COPY into foreign table. Initialization of copying into foreign
+	 * partitions will be done later.
+	 */
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
@@ -2879,83 +2895,9 @@ CopyFrom(CopyState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -2963,7 +2905,7 @@ CopyFrom(CopyState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -3006,7 +2948,7 @@ CopyFrom(CopyState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -3014,7 +2956,6 @@ CopyFrom(CopyState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -3073,24 +3014,14 @@ CopyFrom(CopyState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -3120,7 +3051,7 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -3139,9 +3070,6 @@ CopyFrom(CopyState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -3213,7 +3141,7 @@ CopyFrom(CopyState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -3289,11 +3217,8 @@ CopyFrom(CopyState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -3325,8 +3250,7 @@ CopyFrom(CopyState cstate)
 															  target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7179f589f9..0c728315fa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1248,6 +1248,58 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/* Define multi-insert mode possibility later if needed */
+	resultRelInfo->ri_usesMultiInsert = false;
+}
+
+/*
+ * ExecRelationAllowsMultiInsert
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root)
+{
+	Assert(rri->ri_usesMultiInsert == false);
+
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (partition_root && !partition_root->ri_usesMultiInsert)
+		return false;
+
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	/* Foreign tables don't support multi-inserts. */
+	if (rri->ri_FdwRoutine != NULL)
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 86594bd056..2ce4afc9ad 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -587,6 +587,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	/*
+	 * Use multi-insert mode if the condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert =
+		ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 473c4cd84f..87b6693fe6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -118,6 +118,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 0c48d2a519..7e4feb4c4e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,8 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  Relation partition_root,
 							  int instrument_options);
+extern bool ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
+										  const ResultRelInfo *partition_root);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d68d6..f625f50e5f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -498,7 +498,14 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copy.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copy.c.
+	 *
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.26.2

v12-0002-Fast-COPY-FROM-into-the-foreign-or-sharded--20201110.patchtext/x-patch; charset=UTF-8; name=v12-0002-Fast-COPY-FROM-into-the-foreign-or-sharded--20201110.patchDownload
From 32838f257c706d0efdee009f6fc297e20c7a6ee9 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 10 Nov 2020 18:55:51 +0100
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++-
 contrib/postgres_fdw/postgres_fdw.c           | 137 ++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++
 doc/src/sgml/fdwhandler.sgml                  |  73 ++++++
 src/backend/commands/copy.c                   | 236 +++++++++++-------
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  29 ++-
 src/include/commands/copy.h                   |  11 +
 src/include/foreign/fdwapi.h                  |  15 ++
 11 files changed, 554 insertions(+), 107 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df19fe..fa7740163d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1758,6 +1760,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2061,6 +2077,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2069,10 +2109,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2081,6 +2119,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2100,18 +2141,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..be8db5ac63 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8076,8 +8076,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8088,6 +8089,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8196,6 +8210,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..1657a20d9b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -356,6 +358,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -534,6 +543,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2051,6 +2063,131 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!OK)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..22dcd12f02 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2212,6 +2212,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2312,6 +2329,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..a9a7402440 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,79 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d882396d6f..21f7613dfe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -118,11 +118,14 @@ typedef struct CopyStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	copy_data_source_cb data_source_cb; /* function for reading data */
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	bool		binary;			/* binary format? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -349,17 +352,12 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
-						   RawStmt *raw_query, Oid queryRelId, List *attnamelist,
-						   List *options);
+						   TupleDesc srcTupDesc, RawStmt *raw_query,
+						   Oid queryRelId, List *attnamelist, List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
-							 Oid queryRelId, const char *filename, bool is_program,
-							 List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
 static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
 static int	CopyReadAttributesText(CopyState cstate);
@@ -585,7 +583,13 @@ CopySendEndOfRow(CopyState cstate)
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
 		case COPY_CALLBACK:
-			Assert(false);		/* Not yet supported. */
+			Assert(!cstate->binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 			break;
 	}
 
@@ -1114,8 +1118,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	}
 	else
 	{
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
@@ -1501,6 +1505,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc srcTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1536,6 +1541,11 @@ BeginCopy(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1862,20 +1872,25 @@ EndCopy(CopyState cstate)
 /*
  * Setup CopyState to read tuples from a table or a query for COPY TO.
  */
-static CopyState
+CopyState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc tupDesc,
 			RawStmt *query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || tupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -1914,8 +1929,9 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
-					   options);
+	cstate = BeginCopy(pstate, false, rel, tupDesc, query, queryRelId,
+					   attnamelist, options);
+
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	if (pipe)
@@ -1924,6 +1940,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -2001,7 +2022,7 @@ BeginCopyTo(ParseState *pstate,
 static uint64
 DoCopyTo(CopyState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -2010,7 +2031,9 @@ DoCopyTo(CopyState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -2033,7 +2056,7 @@ DoCopyTo(CopyState cstate)
 /*
  * Clean up storage and release resources for COPY TO.
  */
-static void
+void
 EndCopyTo(CopyState cstate)
 {
 	if (cstate->queryDesc != NULL)
@@ -2049,19 +2072,22 @@ EndCopyTo(CopyState cstate)
 	EndCopy(cstate);
 }
 
-/*
- * Copy from relation or query TO file.
+/* Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyState cstate)
+void
+CopyToStart(CopyState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -2148,6 +2174,32 @@ CopyTo(CopyState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyState cstate)
+{
+	if (cstate->binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -2179,24 +2231,13 @@ CopyTo(CopyState cstate)
 		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, true);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
-
-	if (cstate->binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
@@ -2486,54 +2527,63 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo, buffer->slots[i],
+										  estate, false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -2838,8 +2888,11 @@ CopyFrom(CopyState cstate)
 	 * checked by calling ExecRelationAllowsMultiInsert().  It does not matter
 	 * whether partitions have any volatile default expressions as we use the
 	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
 	 */
 	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
 		!contain_volatile_functions(cstate->whereClause))
 		target_resultRelInfo->ri_usesMultiInsert =
 					ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL);
@@ -2863,10 +2916,18 @@ CopyFrom(CopyState cstate)
 	 * Init COPY into foreign table. Initialization of copying into foreign
 	 * partitions will be done later.
 	 */
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -3244,10 +3305,16 @@ CopyFrom(CopyState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
@@ -3298,7 +3365,8 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, NULL, NULL, InvalidOid, attnamelist,
+																	options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0c728315fa..cc758cd03a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1294,8 +1294,12 @@ ExecRelationAllowsMultiInsert(const ResultRelInfo *rri,
 		rri->ri_TrigDesc->trig_insert_new_table)
 		return false;
 
-	/* Foreign tables don't support multi-inserts. */
-	if (rri->ri_FdwRoutine != NULL)
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 2ce4afc9ad..49812e9a9d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -997,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
@@ -1200,10 +1207,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..08309149ea 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -22,6 +22,7 @@
 /* CopyStateData is private in commands/copy.c */
 typedef struct CopyStateData *CopyState;
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -39,6 +40,16 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern uint64 CopyFrom(CopyState cstate);
 
+extern CopyState BeginCopyTo(ParseState *pstate, Relation rel,
+							 TupleDesc tupDesc, RawStmt *query,
+							 Oid queryRelId, const char *filename, bool is_program,
+							 copy_data_dest_cb data_dest_cb, List *attnamelist,
+							 List *options);
+extern void EndCopyTo(CopyState cstate);
+extern void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+extern void CopyToStart(CopyState cstate);
+extern void CopyToFinish(CopyState cstate);
+
 extern DestReceiver *CreateCopyDestReceiver(void);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..52b213f5aa 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.26.2

#43tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#42)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey-san,

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I needed to look at this patch while working on something related, and I found it
got broken by 6973533650c a couple days ago. So here's a fixed version, to keep
cfbot happy. I haven't done any serious review yet.

Could I or my colleague continue this patch in a few days? It looks it's stalled over one month.

Regards
Takayuki Tsunakawa

#44Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#43)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 11/23/20 7:49 AM, tsunakawa.takay@fujitsu.com wrote:

Hi Andrey-san,

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I needed to look at this patch while working on something related, and I found it
got broken by 6973533650c a couple days ago. So here's a fixed version, to keep
cfbot happy. I haven't done any serious review yet.

Could I or my colleague continue this patch in a few days? It looks it's stalled over one month.

I don't found any problems with this patch that needed to be corrected.
It is wait for actions from committers side, i think.

--
regards,
Andrey Lepikhov
Postgres Professional

#45Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#44)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Mon, Nov 23, 2020 at 5:39 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 11/23/20 7:49 AM, tsunakawa.takay@fujitsu.com wrote:

Could I or my colleague continue this patch in a few days? It looks it's stalled over one month.

I don't found any problems with this patch that needed to be corrected.
It is wait for actions from committers side, i think.

I'm planning to review this patch. I think it would be better for
another pair of eyes to take a look at it, though.

Best regards,
Etsuro Fujita

#46tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Etsuro Fujita (#45)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Andrey-san, Fujita-san,

From: Etsuro Fujita <etsuro.fujita@gmail.com>

On Mon, Nov 23, 2020 at 5:39 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 11/23/20 7:49 AM, tsunakawa.takay@fujitsu.com wrote:

Could I or my colleague continue this patch in a few days? It looks it's

stalled over one month.

I don't found any problems with this patch that needed to be corrected.
It is wait for actions from committers side, i think.

I'm planning to review this patch. I think it would be better for
another pair of eyes to take a look at it, though.

There are the following two issues left untouched.

/messages/by-id/TYAPR01MB2990DC396B338C98F27C8ED3FE1F0@TYAPR01MB2990.jpnprd01.prod.outlook.com

Regards
Takayuki Tsunakawa

#47Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#46)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 11/24/20 9:27 AM, tsunakawa.takay@fujitsu.com wrote:

Andrey-san, Fujita-san,

From: Etsuro Fujita <etsuro.fujita@gmail.com>

On Mon, Nov 23, 2020 at 5:39 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 11/23/20 7:49 AM, tsunakawa.takay@fujitsu.com wrote:

Could I or my colleague continue this patch in a few days? It looks it's

stalled over one month.

I don't found any problems with this patch that needed to be corrected.
It is wait for actions from committers side, i think.

I'm planning to review this patch. I think it would be better for
another pair of eyes to take a look at it, though.

There are the following two issues left untouched.

/messages/by-id/TYAPR01MB2990DC396B338C98F27C8ED3FE1F0@TYAPR01MB2990.jpnprd01.prod.outlook.com

I disagree with your opinion about changing the interface of the
ExecRelationAllowsMultiInsert routine. If you insist on the need for
this change, we need another opinion.

--
regards,
Andrey Lepikhov
Postgres Professional

#48Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#41)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi,

On Tue, Oct 20, 2020 at 11:31 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

(2)
+ Assert(rri->ri_usesMultiInsert == false);

As the above assertion represents, I'm afraid the semantics of

ExecRelationAllowsMultiInsert() and ResultRelInfo->ri_usesMultiInsert are
unclear. In CopyFrom(), ri_usesMultiInsert is set by also considering the
COPY-specific conditions:

+   if (!cstate->volatile_defexprs &&
+           !contain_volatile_functions(cstate->whereClause) &&
+           ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
+           target_resultRelInfo->ri_usesMultiInsert = true;

On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set

purely based on the relation's characteristics.

+   leaf_part_rri->ri_usesMultiInsert =
+           ExecRelationAllowsMultiInsert(leaf_part_rri,

rootResultRelInfo);

In addition to these differences, I think it's a bit confusing that the function

itself doesn't record the check result in ri_usesMultiInsert.

It's probably easy to understand to not add ri_usesMultiInsert, and the

function just encapsulates the check logic based solely on the relation
characteristics and returns the result. So, the argument is just one
ResultRelInfo. The caller (e.g. COPY) combines the function result with other
specific conditions.

I can't fully agreed with this suggestion. We do so because in the future anyone
can call this code from another subsystem for another purposes. And we want
all the relation-related restrictions contains in one routine. CopyState-related
restrictions used in copy.c only and taken out of this function.

I'm sorry if I'm misinterpreting you, but I think the following simply serves its role sufficiently and cleanly without using ri_usesMultiInsert.

bool
ExecRelationAllowsMultiInsert(RelationRelInfo *rri)
{
check if the relation allows multiinsert based on its characteristics;
return true or false;
}

I'm concerned that if one subsystem sets ri_usesMultiInsert to true based on its additional specific conditions, it might lead to another subsystem's misjudgment. For example, when subsystem A and B want to do different things respectively:

[Subsystem A]
if (ExecRelationAllowsMultiInsert(rri) && {A's conditions})
rri->ri_usesMultiInsert = true;
...
if (rri->ri_usesMultiInsert)
do A's business;

[Subsystem B]
if (rri->ri_usesMultiInsert)
do B's business;

Here, what if subsystem A and B don't want each other's specific conditions to hold true? That is, A wants to do A's business only if B's specific conditions don't hold true. If A sets rri->ri_usesMultiInsert to true and passes rri to B, then B wrongly does B's business despite that A's specific conditions are true.

(I think this is due to some form of violation of encapsulation.)

Sorry about chiming in late, but I think Tsunakawa-san raises some
valid concerns.

First, IIUC, is whether we need the ri_usesMultiInsert flag at all. I
think yes, because computing that information repeatedly for every row
seems wasteful, especially for a bulk operation, and even more so if
we're going to call a function when doing so.

Second is whether the interface for setting ri_usesMultiInsert
encourages situations where different modules could possibly engage in
conflicting behaviors. I can't think of a real-life example of that
with the current implementation, but maybe the interface provided in
the patch makes it harder to ensure that that remains true in the
future. Tsunakawa-san, have you encountered an example of this, maybe
when trying to integrate this patch with some other?

Anyway, one thing we could do is rename
ExecRelationAllowsMultiInsert() to ExecSetRelationUsesMultiInsert(),
that is, to make it actually set ri_usesMultiInsert and have places
like CopyFrom() call it if (and only if) its local logic allows
multi-insert to be used. So, ri_usesMultiInsert starts out set to
false and if a module wants to use multi-insert for a given target
relation, it calls ExecSetRelationUsesMultiInsert() to turn the flag
on. Also, given the confusion regarding how execPartition.c
manipulates the flag, maybe change ExecFindPartition() to accept a
Boolean parameter multi_insert, which it will pass down to
ExecInitPartitionInfo(), which in turn will call
ExecSetRelationUsesMultiInsert() for a given partition. Of course, if
the logic in ExecSetRelationUsesMultiInsert() determines that
multi-insert can't be used, for the reasons listed in the function,
then the caller will have to live with that decision.

Any other ideas on how to make this work and look better?

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/d3fbf3bc93b7bcd99ff7fa9ee41e0e20@postgrespro.ru

#49tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#48)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Amit Langote <amitlangote09@gmail.com>

Second is whether the interface for setting ri_usesMultiInsert
encourages situations where different modules could possibly engage in
conflicting behaviors. I can't think of a real-life example of that
with the current implementation, but maybe the interface provided in
the patch makes it harder to ensure that that remains true in the
future. Tsunakawa-san, have you encountered an example of this, maybe
when trying to integrate this patch with some other?

Thanks. No, I pointed out purely from the standpoint of program modularity (based on structured programming?)

Anyway, one thing we could do is rename
ExecRelationAllowsMultiInsert() to ExecSetRelationUsesMultiInsert(),
that is, to make it actually set ri_usesMultiInsert and have places
like CopyFrom() call it if (and only if) its local logic allows
multi-insert to be used. So, ri_usesMultiInsert starts out set to
false and if a module wants to use multi-insert for a given target
relation, it calls ExecSetRelationUsesMultiInsert() to turn the flag
on. Also, given the confusion regarding how execPartition.c

I think separating the setting and inspection of the property into different functions will be good, at least.

manipulates the flag, maybe change ExecFindPartition() to accept a
Boolean parameter multi_insert, which it will pass down to
ExecInitPartitionInfo(), which in turn will call
ExecSetRelationUsesMultiInsert() for a given partition. Of course, if
the logic in ExecSetRelationUsesMultiInsert() determines that
multi-insert can't be used, for the reasons listed in the function,
then the caller will have to live with that decision.

I can't say for sure, but it looks strange to me, because I can't find a good description of multi_insert argument for ExecFindPartition(). If we add multi_insert, I'm afraid we may want to add further arguments for other properties in the future like "Hey, get me the partition that has triggers.", "Next, pass me a partition that uses a foreign table.", etc. I think the current ExecFindPartition() is good -- "Get me a partition that accepts this row."

I wonder if ri_usesMultiInsert is really necessary. Would it cut down enough costs in the intended use case(s), say the heavyweight COPY FROM?

Regards
Takayuki Tsunakawa

#50Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#49)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Thu, Nov 26, 2020 at 11:42 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

Anyway, one thing we could do is rename
ExecRelationAllowsMultiInsert() to ExecSetRelationUsesMultiInsert(),
that is, to make it actually set ri_usesMultiInsert and have places
like CopyFrom() call it if (and only if) its local logic allows
multi-insert to be used. So, ri_usesMultiInsert starts out set to
false and if a module wants to use multi-insert for a given target
relation, it calls ExecSetRelationUsesMultiInsert() to turn the flag
on. Also, given the confusion regarding how execPartition.c

I think separating the setting and inspection of the property into different functions will be good, at least.

manipulates the flag, maybe change ExecFindPartition() to accept a
Boolean parameter multi_insert, which it will pass down to
ExecInitPartitionInfo(), which in turn will call
ExecSetRelationUsesMultiInsert() for a given partition. Of course, if
the logic in ExecSetRelationUsesMultiInsert() determines that
multi-insert can't be used, for the reasons listed in the function,
then the caller will have to live with that decision.

I can't say for sure, but it looks strange to me, because I can't find a good description of multi_insert argument for ExecFindPartition(). If we add multi_insert, I'm afraid we may want to add further arguments for other properties in the future like "Hey, get me the partition that has triggers.", "Next, pass me a partition that uses a foreign table.", etc. I think the current ExecFindPartition() is good -- "Get me a partition that accepts this row."

I wonder if ri_usesMultiInsert is really necessary. Would it cut down enough costs in the intended use case(s), say the heavyweight COPY FROM?

Thinking on this more, I think I'm starting to agree with you on this.
I skimmed the CopyFrom()'s main loop again today and indeed it doesn't
seem that the cost of checking the individual conditions for whether
or not to buffer the current tuple for the given target relation is
all that big to save with ri_usesMultiInsert. So my argument that it
is good for performance is perhaps not that strong.

Andrey's original patch had the flag to, as I understand it, make the
partitioning case work correctly. When inserting into a
non-partitioned table, there's only one relation to care about. In
that case, CopyFrom() can use either the new COPY interface or the
INSERT interface for the entire operation when talking to a foreign
target relation's FDW driver. With partitions, that has to be
considered separately for each partition. What complicates the matter
further is that while the original target relation (the root
partitioned table in the partitioning case) is fully initialized in
CopyFrom(), partitions are lazily initialized by ExecFindPartition().
Note that the initialization of a given target relation can also
optionally involve calling the FDW to perform any pre-COPY
initializations. So if a given partition is a foreign table, whether
the copy operation was initialized using the COPY interface or the
INSERT interface is determined away from CopyFrom(). Andrey created
ri_usesMultiInsert to remember which was used so that CopyFrom() can
use the correct interface during the subsequent interactions with the
partition's driver.

Now, it does not seem outright impossible to do this without the flag,
but maybe Andrey thinks it is good for readability? If it is
confusing from a modularity standpoint, maybe we should rethink that.
That said, I still think that there should be a way for CopyFrom() to
tell ExecFindPartition() which FDW interface to initialize a given
foreign table partition's copy operation with -- COPY if the copy
allows multi-insert, INSERT if not. Maybe the multi_insert parameter
I mentioned earlier would serve that purpose.

--
Amit Langote
EDB: http://www.enterprisedb.com

#51tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#50)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Amit Langote <amitlangote09@gmail.com>

Andrey's original patch had the flag to, as I understand it, make the
partitioning case work correctly. When inserting into a
non-partitioned table, there's only one relation to care about. In
that case, CopyFrom() can use either the new COPY interface or the
INSERT interface for the entire operation when talking to a foreign
target relation's FDW driver. With partitions, that has to be
considered separately for each partition. What complicates the matter
further is that while the original target relation (the root
partitioned table in the partitioning case) is fully initialized in
CopyFrom(), partitions are lazily initialized by ExecFindPartition().

Yeah, I felt it a bit confusing to see the calls to Begin/EndForeignInsert() in both CopyFrom() and ExecInitRoutingInfo().

Note that the initialization of a given target relation can also
optionally involve calling the FDW to perform any pre-COPY
initializations. So if a given partition is a foreign table, whether
the copy operation was initialized using the COPY interface or the
INSERT interface is determined away from CopyFrom(). Andrey created
ri_usesMultiInsert to remember which was used so that CopyFrom() can
use the correct interface during the subsequent interactions with the
partition's driver.

Now, it does not seem outright impossible to do this without the flag,
but maybe Andrey thinks it is good for readability? If it is
confusing from a modularity standpoint, maybe we should rethink that.
That said, I still think that there should be a way for CopyFrom() to
tell ExecFindPartition() which FDW interface to initialize a given
foreign table partition's copy operation with -- COPY if the copy
allows multi-insert, INSERT if not. Maybe the multi_insert parameter
I mentioned earlier would serve that purpose.

I agree with your idea of adding multi_insert argument to ExecFindPartition() to request a multi-insert-capable partition. At first, I thought ExecFindPartition() is used for all operations, insert/delete/update/select, so I found it odd to add multi_insert argument. But ExecFindPartion() is used only for insert, so multi_insert argument seems okay.

Regards
Takayuki Tsunakawa

#52Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#51)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Tue, Dec 1, 2020 at 2:40 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

Andrey's original patch had the flag to, as I understand it, make the
partitioning case work correctly. When inserting into a
non-partitioned table, there's only one relation to care about. In
that case, CopyFrom() can use either the new COPY interface or the
INSERT interface for the entire operation when talking to a foreign
target relation's FDW driver. With partitions, that has to be
considered separately for each partition. What complicates the matter
further is that while the original target relation (the root
partitioned table in the partitioning case) is fully initialized in
CopyFrom(), partitions are lazily initialized by ExecFindPartition().

Yeah, I felt it a bit confusing to see the calls to Begin/EndForeignInsert() in both CopyFrom() and ExecInitRoutingInfo().

Note that the initialization of a given target relation can also
optionally involve calling the FDW to perform any pre-COPY
initializations. So if a given partition is a foreign table, whether
the copy operation was initialized using the COPY interface or the
INSERT interface is determined away from CopyFrom(). Andrey created
ri_usesMultiInsert to remember which was used so that CopyFrom() can
use the correct interface during the subsequent interactions with the
partition's driver.

Now, it does not seem outright impossible to do this without the flag,
but maybe Andrey thinks it is good for readability? If it is
confusing from a modularity standpoint, maybe we should rethink that.
That said, I still think that there should be a way for CopyFrom() to
tell ExecFindPartition() which FDW interface to initialize a given
foreign table partition's copy operation with -- COPY if the copy
allows multi-insert, INSERT if not. Maybe the multi_insert parameter
I mentioned earlier would serve that purpose.

I agree with your idea of adding multi_insert argument to ExecFindPartition() to request a multi-insert-capable partition. At first, I thought ExecFindPartition() is used for all operations, insert/delete/update/select, so I found it odd to add multi_insert argument. But ExecFindPartion() is used only for insert, so multi_insert argument seems okay.

Good. Andrey, any thoughts on this?

--
Amit Langote
EDB: http://www.enterprisedb.com

#53Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#52)
2 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 12/1/20 2:02 PM, Amit Langote wrote:

On Tue, Dec 1, 2020 at 2:40 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>
The code appears to require both BeginForeignCopy and EndForeignCopy,
while the following documentation says they are optional. Which is
correct? (I suppose the latter is correct just like other existing
Begin/End functions are optional.)

Fixed.

Anyway, one thing we could do is rename
ExecRelationAllowsMultiInsert() to ExecSetRelationUsesMultiInsert(

Renamed.

I agree with your idea of adding multi_insert argument to

ExecFindPartition() to request a multi-insert-capable partition. At
first, I thought ExecFindPartition() is used for all operations,
insert/delete/update/select, so I found it odd to add multi_insert
argument. But ExecFindPartion() is used only for insert, so
multi_insert argument seems okay.

Good. Andrey, any thoughts on this?

I have no serious technical arguments against this, other than code
readability and reduce of a routine parameters. Maybe we will be
rethinking it later?

The new version rebased on commit 525e60b742 is attached.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v13-0001-Move-multi-insert-decision-logic-into-executor.patchtext/x-patch; charset=UTF-8; name=v13-0001-Move-multi-insert-decision-logic-into-executor.patchDownload
From 98a6f077cd3b694683ec0e4a3250c040cc33cb39 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 14 Dec 2020 11:29:03 +0500
Subject: [PATCH 1/2] Move multi-insert decision logic into executor

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
---
 src/backend/commands/copyfrom.c          | 142 ++++++-----------------
 src/backend/executor/execMain.c          |  52 +++++++++
 src/backend/executor/execPartition.c     |   7 ++
 src/include/commands/copyfrom_internal.h |  10 --
 src/include/executor/execPartition.h     |   2 +
 src/include/executor/executor.h          |   2 +
 src/include/nodes/execnodes.h            |   8 +-
 7 files changed, 108 insertions(+), 115 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1b14e9a6eb..6d4f6cb80d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -535,12 +535,10 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -650,6 +648,30 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecSetRelationUsesMultiInsert().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 */
+	if (!cstate->volatile_defexprs &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecSetRelationUsesMultiInsert(target_resultRelInfo, NULL);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -665,6 +687,10 @@ CopyFrom(CopyFromState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = resultRelInfo;
 
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
@@ -697,83 +723,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -781,7 +733,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -824,7 +776,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -832,7 +784,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -891,24 +842,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -938,7 +879,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -957,9 +898,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1031,7 +969,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1107,11 +1045,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1143,8 +1078,7 @@ CopyFrom(CopyFromState cstate)
 															  target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7179f589f9..9809c03a8e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1248,6 +1248,58 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+
+	/* Define multi-insert mode possibility later if needed */
+	resultRelInfo->ri_usesMultiInsert = false;
+}
+
+/*
+ * ExecSetRelationUsesMultiInsert
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecSetRelationUsesMultiInsert(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root)
+{
+	Assert(rri->ri_usesMultiInsert == false);
+
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (partition_root && !partition_root->ri_usesMultiInsert)
+		return false;
+
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	/* Foreign tables don't support multi-inserts. */
+	if (rri->ri_FdwRoutine != NULL)
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 86594bd056..5a201dfbfa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -587,6 +587,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootrel,
 					  estate->es_instrument);
 
+	/*
+	 * Use multi-insert mode if the condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert =
+		ExecSetRelationUsesMultiInsert(leaf_part_rri, rootResultRelInfo);
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c15ea803c3..7a948f7e63 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -40,16 +40,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 473c4cd84f..87b6693fe6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -118,6 +118,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 0c48d2a519..ad6eccc028 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,8 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  Relation partition_root,
 							  int instrument_options);
+extern bool ExecSetRelationUsesMultiInsert(const ResultRelInfo *rri,
+										  const ResultRelInfo *partition_root);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61ba4c3666..b6f4f2626b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -498,7 +498,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.25.1

v13-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v13-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 417ff18863e0737c449c6b6ac7694f89c63cb8c3 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 14 Dec 2020 13:37:40 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++++--
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++++-
 contrib/postgres_fdw/postgres_fdw.c           | 137 ++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++++
 doc/src/sgml/fdwhandler.sgml                  |  73 ++++++++++
 src/backend/commands/copy.c                   |   4 +-
 src/backend/commands/copyfrom.c               | 133 +++++++++--------
 src/backend/commands/copyto.c                 |  84 ++++++++---
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  27 +++-
 src/include/commands/copy.h                   |   8 +-
 src/include/foreign/fdwapi.h                  |  15 ++
 13 files changed, 540 insertions(+), 101 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ca2f9f3215..b2a71faabc 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1763,6 +1765,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2066,6 +2082,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2074,10 +2114,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2086,6 +2124,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2105,18 +2146,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..0e2c15c648 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8076,8 +8076,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2: ""
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8088,6 +8089,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8196,6 +8210,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3: ""
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..dd185bdc3b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -191,6 +192,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -357,6 +359,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -535,6 +544,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2052,6 +2064,131 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!OK)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..22dcd12f02 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2212,6 +2212,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2312,6 +2329,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..a9a7402440 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,79 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6143b8bf2..32cff00762 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -303,8 +303,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	{
 		CopyToState cstate;
 
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 6d4f6cb80d..73fc838625 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -307,61 +307,63 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fail.
-	 */
-	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
-
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo, buffer->slots[i],
+										  estate, false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -666,8 +668,11 @@ CopyFrom(CopyFromState cstate)
 	 * checked by calling ExecSetRelationUsesMultiInsert().  It does not matter
 	 * whether partitions have any volatile default expressions as we use the
 	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
 	 */
 	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
 		!contain_volatile_functions(cstate->whereClause))
 		target_resultRelInfo->ri_usesMultiInsert =
 					ExecSetRelationUsesMultiInsert(target_resultRelInfo, NULL);
@@ -691,10 +696,18 @@ CopyFrom(CopyFromState cstate)
 	 * Init copying process into foreign table. Initialization of copying into
 	 * foreign partitions will be done later.
 	 */
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -1072,10 +1085,16 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c7e5f04446..b1d50b01cc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -50,6 +50,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -80,11 +81,14 @@ typedef struct CopyToStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -114,7 +118,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -286,6 +289,14 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 	}
 
 	resetStringInfo(fe_msgbuf);
@@ -373,19 +384,24 @@ EndCopy(CopyToState cstate)
 CopyToState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc srcTupDesc,
 			RawStmt *raw_query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || srcTupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -450,6 +466,11 @@ BeginCopyTo(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query && !is_from);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -695,6 +716,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -772,7 +798,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -781,7 +807,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -821,18 +849,22 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -919,6 +951,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -951,23 +1009,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9809c03a8e..a21d4d2fc1 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1294,8 +1294,12 @@ ExecSetRelationUsesMultiInsert(const ResultRelInfo *rri,
 		rri->ri_TrigDesc->trig_insert_new_table)
 		return false;
 
-	/* Foreign tables don't support multi-inserts. */
-	if (rri->ri_FdwRoutine != NULL)
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5a201dfbfa..56ec9bbf41 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -997,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
@@ -1200,10 +1207,16 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert &&
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 127a3c61e2..01bb3e8ad4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -78,12 +79,17 @@ extern DestReceiver *CreateCopyDestReceiver(void);
 /*
  * internal prototypes
  */
-extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
+extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel,
+							   TupleDesc tupDesc, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..52b213f5aa 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.25.1

#54Tang, Haiying
tanghy.fnst@cn.fujitsu.com
In reply to: Andrey V. Lepikhov (#53)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

There is an error report in your patch as follows. Please take a check.

https://travis-ci.org/github/postgresql-cfbot/postgresql/jobs/750682857#L1519

copyfrom.c:374:21: error: ‘save_cur_lineno’ is used uninitialized in this function [-Werror=uninitialized]

Regards,
Tang

#55Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Tang, Haiying (#54)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 12/22/20 12:04 PM, Tang, Haiying wrote:

Hi Andrey,

There is an error report in your patch as follows. Please take a check.

https://travis-ci.org/github/postgresql-cfbot/postgresql/jobs/750682857#L1519

copyfrom.c:374:21: error: ‘save_cur_lineno’ is used uninitialized in this function [-Werror=uninitialized]

Regards,
Tang

Thank you,
see new version in attachment.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v13_1-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v13_1-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From e2bc0980f05061afe199de63b76b00020208510a Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 14 Dec 2020 13:37:40 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++++--
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++++-
 contrib/postgres_fdw/postgres_fdw.c           | 137 ++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++++
 doc/src/sgml/fdwhandler.sgml                  |  73 ++++++++++
 src/backend/commands/copy.c                   |   4 +-
 src/backend/commands/copyfrom.c               | 126 +++++++++-------
 src/backend/commands/copyto.c                 |  84 ++++++++---
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  27 +++-
 src/include/commands/copy.h                   |   8 +-
 src/include/foreign/fdwapi.h                  |  15 ++
 13 files changed, 540 insertions(+), 94 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ca2f9f3215..b2a71faabc 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1763,6 +1765,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2066,6 +2082,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2074,10 +2114,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2086,6 +2124,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2105,18 +2146,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..be8db5ac63 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8076,8 +8076,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8088,6 +8089,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8196,6 +8210,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..dd185bdc3b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -191,6 +192,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -357,6 +359,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -535,6 +544,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2052,6 +2064,131 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ *
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ *
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	Assert(copy_fmstate == NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
+
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+
+		if (!OK)
+			PG_RE_THROW();
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c5417b..22dcd12f02 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2212,6 +2212,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2312,6 +2329,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..a9a7402440 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,79 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6143b8bf2..32cff00762 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -303,8 +303,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	{
 		CopyToState cstate;
 
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 6d4f6cb80d..17aac24bdd 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -314,54 +314,63 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo, buffer->slots[i],
+										  estate, false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -666,8 +675,11 @@ CopyFrom(CopyFromState cstate)
 	 * checked by calling ExecSetRelationUsesMultiInsert().  It does not matter
 	 * whether partitions have any volatile default expressions as we use the
 	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
 	 */
 	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
 		!contain_volatile_functions(cstate->whereClause))
 		target_resultRelInfo->ri_usesMultiInsert =
 					ExecSetRelationUsesMultiInsert(target_resultRelInfo, NULL);
@@ -691,10 +703,18 @@ CopyFrom(CopyFromState cstate)
 	 * Init copying process into foreign table. Initialization of copying into
 	 * foreign partitions will be done later.
 	 */
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -1072,10 +1092,16 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c7e5f04446..608bb3771d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -50,6 +50,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -80,11 +81,14 @@ typedef struct CopyToStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -114,7 +118,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -286,6 +289,14 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 	}
 
 	resetStringInfo(fe_msgbuf);
@@ -373,19 +384,24 @@ EndCopy(CopyToState cstate)
 CopyToState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc srcTupDesc,
 			RawStmt *raw_query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || srcTupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -450,6 +466,11 @@ BeginCopyTo(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -695,6 +716,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -772,7 +798,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -781,7 +807,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -821,18 +849,22 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -919,6 +951,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -951,23 +1009,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9809c03a8e..a21d4d2fc1 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1294,8 +1294,12 @@ ExecSetRelationUsesMultiInsert(const ResultRelInfo *rri,
 		rri->ri_TrigDesc->trig_insert_new_table)
 		return false;
 
-	/* Foreign tables don't support multi-inserts. */
-	if (rri->ri_FdwRoutine != NULL)
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 637e900b09..f3b9197db1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -996,9 +996,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
@@ -1199,10 +1206,16 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert &&
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 127a3c61e2..01bb3e8ad4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -78,12 +79,17 @@ extern DestReceiver *CreateCopyDestReceiver(void);
 /*
  * internal prototypes
  */
-extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
+extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel,
+							   TupleDesc tupDesc, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..52b213f5aa 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.25.1

#56Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Andrey V. Lepikhov (#55)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hi

see new version in attachment.

I took a look into the patch, and have some comments.

1.
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
I don't quite understand this comment,
does it means we want to detect something like Null reference ?
2.
+	PG_FINALLY();
+	{
	...
+		if (!OK)
+			PG_RE_THROW();
+	}
Is this PG_RE_THROW() necessary ? 
IMO, PG_FINALLY will reproduce the PG_RE_THROW action if we get to the code block due to an error being thrown.
3.
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));

I found some similar message like the following:

pg_log_warning("unexpected extra results during COPY of table \"%s\"",
tocEntryTag);
How about using existing messages style ?

4.
I noticed some not standard code comment[1]---------- + /* Finish COPY IN protocol. It is needed to do after successful copy or + * after an error. + */.
I think it's better to comment like:
/*
* line 1
* line 2
*/

[1]-----------
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+/*
+ *
+ * postgresExecForeignCopy
+/*
+ *
+ * postgresBeginForeignCopy

-----------
Best regards,
Houzj

#57Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Hou, Zhijie (#56)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 29.12.2020 16:20, Hou, Zhijie wrote:

see new version in attachment.

I took a look into the patch, and have some comments.

1.
+	PG_FINALLY();
+	{
+		copy_fmstate = NULL; /* Detect problems */
I don't quite understand this comment,
does it means we want to detect something like Null reference ?
2.
+	PG_FINALLY();
+	{
...
+		if (!OK)
+			PG_RE_THROW();
+	}
Is this PG_RE_THROW() necessary ?
IMO, PG_FINALLY will reproduce the PG_RE_THROW action if we get to the code block due to an error being thrown.

This is a debugging stage atavisms. fixed.

3.
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));

I found some similar message like the following:

pg_log_warning("unexpected extra results during COPY of table \"%s\"",
tocEntryTag);
How about using existing messages style ?

This style is intended for use in frontend utilities, not for contrib
extensions, i think.

4.
I noticed some not standard code comment[1].
I think it's better to comment like:
/*
* line 1
* line 2
*/

[1]-----------
+		/* Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+/*
+ *
+ * postgresExecForeignCopy
+/*
+ *
+ * postgresBeginForeignCopy

Thanks, fixed.
The patch in attachment rebased on 107a2d4204.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v13_2-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/plain; charset=UTF-8; name=v13_2-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patch; x-mac-creator=0; x-mac-type=0Download
From 1c5439d802b7654ee50dc4326b9bc24fc7f44677 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 14 Dec 2020 13:37:40 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++++--
 .../postgres_fdw/expected/postgres_fdw.out    |  46 ++++++-
 contrib/postgres_fdw/postgres_fdw.c           | 130 ++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++++
 doc/src/sgml/fdwhandler.sgml                  |  73 ++++++++++
 src/backend/commands/copy.c                   |   4 +-
 src/backend/commands/copyfrom.c               | 126 ++++++++++-------
 src/backend/commands/copyto.c                 |  84 ++++++++---
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  27 +++-
 src/include/commands/copy.h                   |   8 +-
 src/include/foreign/fdwapi.h                  |  15 ++
 13 files changed, 533 insertions(+), 94 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ca2f9f3215..b2a71faabc 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1763,6 +1765,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2066,6 +2082,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2074,10 +2114,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2086,6 +2124,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2105,18 +2146,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..db7b09c1fe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8076,8 +8076,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8088,6 +8089,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8196,6 +8210,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..a4a078a76a 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -191,6 +192,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -357,6 +359,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -535,6 +544,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2052,6 +2064,124 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..8fc5ff018f 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..53b9d865da 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2212,6 +2212,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2312,6 +2329,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..a9a7402440 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,79 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6143b8bf2..32cff00762 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -303,8 +303,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	{
 		CopyToState cstate;
 
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 6d4f6cb80d..17aac24bdd 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -314,54 +314,63 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo, buffer->slots[i],
+										  estate, false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -666,8 +675,11 @@ CopyFrom(CopyFromState cstate)
 	 * checked by calling ExecSetRelationUsesMultiInsert().  It does not matter
 	 * whether partitions have any volatile default expressions as we use the
 	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
 	 */
 	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
 		!contain_volatile_functions(cstate->whereClause))
 		target_resultRelInfo->ri_usesMultiInsert =
 					ExecSetRelationUsesMultiInsert(target_resultRelInfo, NULL);
@@ -691,10 +703,18 @@ CopyFrom(CopyFromState cstate)
 	 * Init copying process into foreign table. Initialization of copying into
 	 * foreign partitions will be done later.
 	 */
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -1072,10 +1092,16 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c7e5f04446..608bb3771d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -50,6 +50,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -80,11 +81,14 @@ typedef struct CopyToStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -114,7 +118,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -286,6 +289,14 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 	}
 
 	resetStringInfo(fe_msgbuf);
@@ -373,19 +384,24 @@ EndCopy(CopyToState cstate)
 CopyToState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc srcTupDesc,
 			RawStmt *raw_query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || srcTupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -450,6 +466,11 @@ BeginCopyTo(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -695,6 +716,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -772,7 +798,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -781,7 +807,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -821,18 +849,22 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -919,6 +951,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -951,23 +1009,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9809c03a8e..a21d4d2fc1 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1294,8 +1294,12 @@ ExecSetRelationUsesMultiInsert(const ResultRelInfo *rri,
 		rri->ri_TrigDesc->trig_insert_new_table)
 		return false;
 
-	/* Foreign tables don't support multi-inserts. */
-	if (rri->ri_FdwRoutine != NULL)
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 637e900b09..f3b9197db1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -996,9 +996,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
@@ -1199,10 +1206,16 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert &&
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 127a3c61e2..01bb3e8ad4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -78,12 +79,17 @@ extern DestReceiver *CreateCopyDestReceiver(void);
 /*
  * internal prototypes
  */
-extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
+extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel,
+							   TupleDesc tupDesc, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..52b213f5aa 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.29.2

#58Tang, Haiying
tanghy.fnst@cn.fujitsu.com
In reply to: Andrey Lepikhov (#57)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

I had a general look at this extension feature, I think it's beneficial for some application scenarios of PostgreSQL. So I did 7 performance cases test on your patch(v13). The results are really good. As you can see below we can get 7-10 times improvement with this patch.

PSA test_copy_from.sql shows my test cases detail(I didn't attach my data file since it's too big).

Below are the test results:
'Test No' corresponds to the number(0 1...6) in attached test_copy_from.sql.
%reg=(Patched-Unpatched)/Unpatched), Unit is millisecond.

|Test No| Test Case |Patched(ms) | Unpatched(ms) |%reg |
|-------|-----------------------------------------------------------------------------------------|-------------|---------------|-------|
|0 |COPY FROM insertion into the partitioned table(parition is foreign table) | 102483.223 | 1083300.907 | -91% |
|1 |COPY FROM insertion into the partitioned table(parition is foreign partition) | 104779.893 | 1207320.287 | -91% |
|2 |COPY FROM insertion into the foreign table(without partition) | 100268.730 | 1077309.158 | -91% |
|3 |COPY FROM insertion into the partitioned table(part of foreign partitions) | 104110.620 | 1134781.855 | -91% |
|4 |COPY FROM insertion into the partitioned table with constraint(part of foreign partition)| 136356.201 | 1238539.603 | -89% |
|5 |COPY FROM insertion into the foreign table with constraint(without partition) | 136818.262 | 1189921.742 | -89% |
|6 |\copy insertion into the partitioned table with constraint. | 140368.072 | 1242689.924 | -89% |

If there is any question on my tests, please feel free to ask.

Best Regard,
Tang

Attachments:

test_copy_from.sqlapplication/octet-stream; name=test_copy_from.sqlDownload
#59Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Andrey Lepikhov (#57)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

Unfortunately, this no longer applies :-( I tried to apply this on top
of c532d15ddd (from 2020/12/30) but even that has non-trivial conflicts.

Can you send a rebased version?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#60Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Tang, Haiying (#58)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 1/11/21 4:59 PM, Tang, Haiying wrote:

Hi Andrey,

I had a general look at this extension feature, I think it's beneficial for some application scenarios of PostgreSQL. So I did 7 performance cases test on your patch(v13). The results are really good. As you can see below we can get 7-10 times improvement with this patch.

PSA test_copy_from.sql shows my test cases detail(I didn't attach my data file since it's too big).

Below are the test results:
'Test No' corresponds to the number(0 1...6) in attached test_copy_from.sql.
%reg=(Patched-Unpatched)/Unpatched), Unit is millisecond.

|Test No| Test Case |Patched(ms) | Unpatched(ms) |%reg |
|-------|-----------------------------------------------------------------------------------------|-------------|---------------|-------|
|0 |COPY FROM insertion into the partitioned table(parition is foreign table) | 102483.223 | 1083300.907 | -91% |
|1 |COPY FROM insertion into the partitioned table(parition is foreign partition) | 104779.893 | 1207320.287 | -91% |
|2 |COPY FROM insertion into the foreign table(without partition) | 100268.730 | 1077309.158 | -91% |
|3 |COPY FROM insertion into the partitioned table(part of foreign partitions) | 104110.620 | 1134781.855 | -91% |
|4 |COPY FROM insertion into the partitioned table with constraint(part of foreign partition)| 136356.201 | 1238539.603 | -89% |
|5 |COPY FROM insertion into the foreign table with constraint(without partition) | 136818.262 | 1189921.742 | -89% |
|6 |\copy insertion into the partitioned table with constraint. | 140368.072 | 1242689.924 | -89% |

If there is any question on my tests, please feel free to ask.

Best Regard,
Tang

Thank you for this work.
Sometimes before i suggested additional optimization [1]/messages/by-id/da7ed3f5-b596-2549-3710-4cc2a602ec17@postgrespro.ru which can
additionally speed up COPY by 2-4 times. Maybe you can perform the
benchmark for this solution too?

[1]: /messages/by-id/da7ed3f5-b596-2549-3710-4cc2a602ec17@postgrespro.ru
/messages/by-id/da7ed3f5-b596-2549-3710-4cc2a602ec17@postgrespro.ru

--
regards,
Andrey Lepikhov
Postgres Professional

#61Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Tomas Vondra (#59)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 1/11/21 11:16 PM, Tomas Vondra wrote:

Hi Andrey,

Unfortunately, this no longer applies :-( I tried to apply this on top
of c532d15ddd (from 2020/12/30) but even that has non-trivial conflicts.

Can you send a rebased version?

regards

Applied on 044aa9e70e.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v13_3-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-patch; charset=UTF-8; name=v13_3-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From f8e0cd305c691108313c2365cc4576e4d5e0bd38 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Tue, 12 Jan 2021 08:54:45 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c                |  60 ++++++--
 .../postgres_fdw/expected/postgres_fdw.out    |  46 ++++++-
 contrib/postgres_fdw/postgres_fdw.c           | 130 ++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++++
 doc/src/sgml/fdwhandler.sgml                  |  73 ++++++++++
 src/backend/commands/copy.c                   |   4 +-
 src/backend/commands/copyfrom.c               | 126 ++++++++++-------
 src/backend/commands/copyto.c                 |  84 ++++++++---
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execPartition.c          |  27 +++-
 src/include/commands/copy.h                   |   8 +-
 src/include/foreign/fdwapi.h                  |  15 ++
 13 files changed, 533 insertions(+), 94 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..b1ca479a65 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1763,6 +1765,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2066,6 +2082,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2074,10 +2114,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2086,6 +2124,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2105,18 +2146,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..db7b09c1fe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8076,8 +8076,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8088,6 +8089,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8196,6 +8210,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..fa0eccb485 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -191,6 +192,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -357,6 +359,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -535,6 +544,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2052,6 +2064,124 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a1bc..c38c219adf 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..53b9d865da 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2212,6 +2212,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2312,6 +2329,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..a9a7402440 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -796,6 +796,79 @@ EndForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
 int
 IsForeignRelUpdatable(Relation rel);
 </programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8737..cd8aa57026 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -303,8 +303,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	{
 		CopyToState cstate;
 
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 4e2320e2fa..57e4addabf 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,63 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo, buffer->slots[i],
+										  estate, false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -668,8 +677,11 @@ CopyFrom(CopyFromState cstate)
 	 * checked by calling ExecSetRelationUsesMultiInsert().  It does not matter
 	 * whether partitions have any volatile default expressions as we use the
 	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
 	 */
 	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
 		!contain_volatile_functions(cstate->whereClause))
 		target_resultRelInfo->ri_usesMultiInsert =
 					ExecSetRelationUsesMultiInsert(target_resultRelInfo, NULL);
@@ -693,10 +705,18 @@ CopyFrom(CopyFromState cstate)
 	 * Init copying process into foreign table. Initialization of copying into
 	 * foreign partitions will be done later.
 	 */
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			Assert(target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -1075,10 +1095,16 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
 	CopyMultiInsertInfoCleanup(&multiInsertInfo);
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e04ec1e331..7a10c9dc9e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -52,6 +52,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -82,11 +83,14 @@ typedef struct CopyToStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -117,7 +121,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -289,6 +292,14 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 	}
 
 	/* Update the progress */
@@ -382,19 +393,24 @@ EndCopy(CopyToState cstate)
 CopyToState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc srcTupDesc,
 			RawStmt *raw_query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || srcTupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -459,6 +475,11 @@ BeginCopyTo(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -704,6 +725,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -786,7 +812,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -795,7 +821,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -835,18 +863,22 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -933,6 +965,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -967,23 +1025,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f217486b85..fcfd6027cc 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1294,8 +1294,12 @@ ExecSetRelationUsesMultiInsert(const ResultRelInfo *rri,
 		rri->ri_TrigDesc->trig_insert_new_table)
 		return false;
 
-	/* Foreign tables don't support multi-inserts. */
-	if (rri->ri_FdwRoutine != NULL)
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
 		return false;
 
 	/* OK, caller can use multi-insert on this relation. */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 1f5f392bf9..386a2a9013 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -996,9 +996,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
@@ -1199,10 +1206,16 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert &&
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e33d..a7e7224ac8 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -78,12 +79,17 @@ extern DestReceiver *CreateCopyDestReceiver(void);
 /*
  * internal prototypes
  */
-extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
+extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel,
+							   TupleDesc tupDesc, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499fb1..38e5dbb8e2 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -104,6 +104,16 @@ typedef void (*BeginForeignInsert_function) (ModifyTableState *mtstate,
 typedef void (*EndForeignInsert_function) (EState *estate,
 										   ResultRelInfo *rinfo);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
 typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
@@ -220,6 +230,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
-- 
2.25.1

#62Tang, Haiying
tanghy.fnst@cn.fujitsu.com
In reply to: Andrey V. Lepikhov (#61)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hi Andrey,

Sometimes before i suggested additional optimization [1] which can
additionally speed up COPY by 2-4 times. Maybe you can perform the
benchmark for this solution too?

Sorry for the late reply, I just have time to take this test now.
But the patch no longer applies, I tried to apply on e42b3c3bd6(2021/1/26) but failed.

Can you send a rebased version?

Regards,
Tang

#63tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tang, Haiying (#62)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

Hello, Andrey-san,

From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com>

Sometimes before i suggested additional optimization [1] which can
additionally speed up COPY by 2-4 times. Maybe you can perform the
benchmark for this solution too?

...

But the patch no longer applies, I tried to apply on e42b3c3bd6(2021/1/26) but
failed.

Can you send a rebased version?

I think the basic part of this patch set is the following. The latter file unfortunately no longer applies to HEAD.

v13-0001-Move-multi-insert-decision-logic-into-executor.patch
v13_3-0002-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patch

Plus, as Tang-san said, I'm afraid the following files are older and doesn't apply.

v9-0003-Add-separated-connections-into-the-postgres_fdw.patch
v9-0004-Optimized-version-of-the-Fast-COPY-FROM-feature

When do you think you can submit the rebased version of them? (IIUC at the off-list HighGo meeting, you're planning to post them late this week after the global snapshot patch.) Just in case you are not going to do them for the moment, can we rebase and/or further modify them so that they can be committed in PG 14?

Regards
Takayuki Tsunakawa

#64Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#63)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 2/2/21 11:57, tsunakawa.takay@fujitsu.com wrote:

Hello, Andrey-san,

From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com>

Sometimes before i suggested additional optimization [1] which can
additionally speed up COPY by 2-4 times. Maybe you can perform the
benchmark for this solution too?

...

But the patch no longer applies, I tried to apply on e42b3c3bd6(2021/1/26) but
failed.

Can you send a rebased version?

When do you think you can submit the rebased version of them? (IIUC at the off-list HighGo meeting, you're planning to post them late this week after the global snapshot patch.) Just in case you are not going to do them for the moment, can we rebase and/or further modify them so that they can be committed in PG 14?

Of course, you can rebase it.

--
regards,
Andrey Lepikhov
Postgres Professional

#65tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey Lepikhov (#64)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>

Of course, you can rebase it.

Thank you. I might modify the basic part to incorporate my past proposal about improving the layering or modularity related to ri_useMultiInsert. (But I may end up giving up due to lack of energy.)

Also, I might defer working on the extended part (v9 0003 and 0004) and further separate them in a different thread, if it seems to take longer.

Regards
Takayuki Tsunakawa

#66tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: tsunakawa.takay@fujitsu.com (#65)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com>

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>

Of course, you can rebase it.

Thank you. I might modify the basic part to incorporate my past proposal
about improving the layering or modularity related to ri_useMultiInsert. (But I
may end up giving up due to lack of energy.)

Rebased to HEAD with the following modifications. It passes make check in the top directory and contrib/postgres_fdw.

(1)
Placed and ordered new three FDW functions consistently among their documentation, declaration and definition.

(2)
Check if BeginForeignCopy is not NULL before calling it, because the documentation says it's not mandatory.

(3)
Changed the function name ExecSetRelationUsesMultiInsert() to ExecMultiInsertAllowed() because it does *not* set anything but returns a boolean value to indicate whether the relation allows multi-insert. I was bugged about this function's interface and the use of ri_usesMultiInsert in ResultRelInfo. I still feel a bit uneasy about things like whether the function should really take the partition root (parent) argument, and whether it's a good design that ri_usesMultiInsert is used for the executor functions to determine which of Begin/EndForeignCopy() or Begin/EndForeignInsert() should be called. I'm fine with COPY using executor, but it feels a bit uncomfortable for the executor functions to be aware of COPY.

That said, with the reviews from some people and good performance results, I think this can be ready for committer.

Also, I might defer working on the extended part (v9 0003 and 0004) and further
separate them in a different thread, if it seems to take longer.

I reviewed them but haven't rebased them (it seems to take more labor.)
Andrey-san, could you tell us:

* Why is a separate FDW connection established for each COPY? To avoid using the same FDW connection for multiple foreign table partitions in a single COPY run?

* In what kind of test did you get 2-4x performance gain? COPY into many foreign table partitions where the input rows are ordered randomly enough that many rows don't accumulate in the COPY buffer?

Regards
Takayuki Tsunakawa

Attachments:

v14-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v14-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 2e6bfc20e01e5c4afddb895a4a7f22b0b7f51f7c Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v14] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo structure.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  60 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 131 ++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 +++++
 doc/src/sgml/fdwhandler.sgml                   |  73 +++++++
 src/backend/commands/copy.c                    |   4 +-
 src/backend/commands/copyfrom.c                | 269 +++++++++++--------------
 src/backend/commands/copyto.c                  |  84 ++++++--
 src/backend/executor/execMain.c                |  52 +++++
 src/backend/executor/execPartition.c           |  34 +++-
 src/include/commands/copy.h                    |   8 +-
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/execPartition.h           |   2 +
 src/include/executor/executor.h                |   2 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 17 files changed, 637 insertions(+), 207 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..71e0538c 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,20 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2135,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2166,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2176,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2198,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e11..dd310bb 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0e97706..3e786ee 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2159,6 +2171,125 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, mtstate->ps.state);
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 151f4f1..e0d54ef 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 854913a..ecc6273 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1068,6 +1068,79 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..cd8aa57 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -303,8 +303,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	{
 		CopyToState cstate;
 
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c39cc73..931e0b6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,12 +547,10 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -652,6 +660,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo, NULL);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -667,10 +702,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->operation = CMD_INSERT;
 	mtstate->resultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+		{
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -699,83 +746,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -783,7 +756,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -826,7 +799,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -834,7 +807,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -893,24 +865,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -940,7 +902,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -959,9 +921,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1033,7 +992,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1111,11 +1070,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1141,14 +1097,19 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e04ec1e..7a10c9d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -52,6 +52,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -82,11 +83,14 @@ typedef struct CopyToStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -117,7 +121,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -289,6 +292,14 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 	}
 
 	/* Update the progress */
@@ -382,19 +393,24 @@ EndCopy(CopyToState cstate)
 CopyToState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc srcTupDesc,
 			RawStmt *raw_query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || srcTupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -459,6 +475,11 @@ BeginCopyTo(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -704,6 +725,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -786,7 +812,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -795,7 +821,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -835,18 +863,22 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -933,6 +965,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -967,23 +1025,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f4dd47a..95c05d3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1247,10 +1247,62 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root)
+{
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (partition_root && !partition_root->ri_usesMultiInsert)
+		return false;
+
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 746cd1e..3ef3712 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -587,6 +587,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * Use multi-insert mode if the condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert =
+		ExecMultiInsertAllowed(leaf_part_rri, rootResultRelInfo);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +996,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1209,10 +1223,16 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert &&
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..a7e7224 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -78,12 +79,17 @@ extern DestReceiver *CreateCopyDestReceiver(void);
 /*
  * internal prototypes
  */
-extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
+extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel,
+							   TupleDesc tupDesc, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37942d..c527610 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -41,16 +41,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde..b5f73d8 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -118,6 +118,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 758c3ca..c526cae 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,8 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  Relation partition_root,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri,
+										  const ResultRelInfo *partition_root);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..676a1b7 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d65099c..5b11d62 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -504,7 +504,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#67Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#66)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 2/9/21 9:35 AM, tsunakawa.takay@fujitsu.com wrote:

From: tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com>

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Also, I might defer working on the extended part (v9 0003 and 0004) and further
separate them in a different thread, if it seems to take longer.

I reviewed them but haven't rebased them (it seems to take more labor.)
Andrey-san, could you tell us:

* Why is a separate FDW connection established for each COPY? To avoid using the same FDW connection for multiple foreign table partitions in a single COPY run?

With separate connection you can init a 'COPY FROM' session for each
foreign partition just one time on partition initialization.

* In what kind of test did you get 2-4x performance gain? COPY into many foreign table partitions where the input rows are ordered randomly enough that many rows don't accumulate in the COPY buffer?

I used 'INSERT INTO .. SELECT * FROM generate_series(1, N)' to generate
test data and HASH partitioning to avoid skews.

--
regards,
Andrey Lepikhov
Postgres Professional

#68tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey V. Lepikhov (#67)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>

On 2/9/21 9:35 AM, tsunakawa.takay@fujitsu.com wrote:

* Why is a separate FDW connection established for each COPY? To avoid

using the same FDW connection for multiple foreign table partitions in a single
COPY run?
With separate connection you can init a 'COPY FROM' session for each
foreign partition just one time on partition initialization.

* In what kind of test did you get 2-4x performance gain? COPY into many

foreign table partitions where the input rows are ordered randomly enough that
many rows don't accumulate in the COPY buffer?
I used 'INSERT INTO .. SELECT * FROM generate_series(1, N)' to generate
test data and HASH partitioning to avoid skews.

I guess you used many hash partitions. Sadly, The current COPY implementation only accumulates either 1,000 rows or 64 KB of input data (very small!) before flushing all CopyMultiInsertBuffers. One CopyMultiInsertBuffer corresponds to one partition. Flushing a CopyMultiInsertBuffer calls ExecForeignCopy() once, which connects to a remote database, runs COPY FROM STDIN, and disconnects. Here, the flushing trigger (1,000 rows or 64 KB input data, whichever comes first) is so small that if there are many target partitions, the amount of data for each partition is small.

Looking at the triggering threshold values, the description (of MAX_BUFFERED_TUPLES at least) seems to indicate that they take effect per CopyMultiInsertBuffer:

/*
* No more than this many tuples per CopyMultiInsertBuffer
*
* Caution: Don't make this too big, as we could end up with this many
* CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
* multiInsertBuffers list. Increasing this can cause quadratic growth in
* memory requirements during copies into partitioned tables with a large
* number of partitions.
*/
#define MAX_BUFFERED_TUPLES 1000

/*
* Flush buffers if there are >= this many bytes, as counted by the input
* size, of tuples stored.
*/
#define MAX_BUFFERED_BYTES 65535

But these threshold take effect across all CopyMultiInsertBuffers:

/*
* Returns true if the buffers are full
*/
static inline bool
CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
{
if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
return true;
return false;
}

So, I think the direction to take is to allow more data to accumulate before flushing. I'm not very excited about the way 0003 and 0004 establishes a new connection for each partition; it adds flags to many places, and postgresfdw_xact_callback() has to be aware of COPY-specific processing. Plus, we have to take care of the message difference you found in the regression test.

Why don't we focus on committing the basic part and addressing the extended part (0003 and 0004) separately later? As Tang-san and you showed, the basic part already demonstrated impressive improvement. If there's no objection, I'd like to make this ready for committer in a few days.

Regards
Takayuki Tsunakawa

#69Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#68)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 2/9/21 12:47 PM, tsunakawa.takay@fujitsu.com wrote:

From: Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>
I guess you used many hash partitions. Sadly, The current COPY implementation only accumulates either 1,000 rows or 64 KB of input data (very small!) before flushing all CopyMultiInsertBuffers. One CopyMultiInsertBuffer corresponds to one partition. Flushing a CopyMultiInsertBuffer calls ExecForeignCopy() once, which connects to a remote database, runs COPY FROM STDIN, and disconnects. Here, the flushing trigger (1,000 rows or 64 KB input data, whichever comes first) is so small that if there are many target partitions, the amount of data for each partition is small.

I tried to use 1E4 - 1E8 rows in a tuple buffer. But the results weren't
impressive.
We can use one more GUC instead of a precompiled constant.

Why don't we focus on committing the basic part and addressing the extended part (0003 and 0004) separately later?

I focused only on the 0001 and 0002 patches.

As Tang-san and you showed, the basic part already demonstrated impressive improvement. If there's no objection, I'd like to make this ready for committer in a few days.

Good.

--
regards,
Andrey Lepikhov
Postgres Professional

#70tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey V. Lepikhov (#69)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>

I tried to use 1E4 - 1E8 rows in a tuple buffer. But the results weren't
impressive.

I guess that's because the 64 KB threshold came first.

We can use one more GUC instead of a precompiled constant.

Yes, agreed.

Why don't we focus on committing the basic part and addressing the

extended part (0003 and 0004) separately later?
I focused only on the 0001 and 0002 patches.

As Tang-san and you showed, the basic part already demonstrated

impressive improvement. If there's no objection, I'd like to make this ready for
committer in a few days.
Good.

Glad to hear that.

Regards
Takayuki Tsunakawa

#71tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey V. Lepikhov (#69)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>

On 2/9/21 12:47 PM, tsunakawa.takay@fujitsu.com wrote:

As Tang-san and you showed, the basic part already demonstrated

impressive improvement. If there's no objection, I'd like to make this ready for
committer in a few days.
Good.

I've marked this as ready for committer. Good luck.

Regards
Takayuki Tsunakawa

#72Justin Pryzby
pryzby@telsasoft.com
In reply to: tsunakawa.takay@fujitsu.com (#66)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Tue, Feb 09, 2021 at 04:35:03AM +0000, tsunakawa.takay@fujitsu.com wrote:

Rebased to HEAD with the following modifications. It passes make check in the top directory and contrib/postgres_fdw.
That said, with the reviews from some people and good performance results, I think this can be ready for committer.

This is crashing during fdw check.
http://cfbot.cputube.org/andrey-lepikhov.html

Maybe it's related to this patch:
|commit 6214e2b2280462cbc3aa1986e350e167651b3905
| Fix permission checks on constraint violation errors on partitions.
| Security: CVE-2021-3393

TRAP: FailedAssertion("n >= 0 && n < list->length", File: "../../src/include/nodes/pg_list.h", Line: 259, PID: 19780)

(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fd33a557801 in __GI_abort () at abort.c:79
#2 0x000055f7f53bbc88 in ExceptionalCondition (conditionName=conditionName@entry=0x7fd33b81bc40 "n >= 0 && n < list->length", errorType=errorType@entry=0x7fd33b81b698 "FailedAssertion",
fileName=fileName@entry=0x7fd33b81be70 "../../src/include/nodes/pg_list.h", lineNumber=lineNumber@entry=259) at assert.c:69
#3 0x00007fd33b816b54 in list_nth_cell (n=<optimized out>, list=<optimized out>) at ../../src/include/nodes/pg_list.h:259
#4 list_nth (n=<optimized out>, list=<optimized out>) at ../../src/include/nodes/pg_list.h:281
#5 exec_rt_fetch (estate=<optimized out>, rti=<optimized out>) at ../../src/include/executor/executor.h:558
#6 postgresBeginForeignCopy (mtstate=<optimized out>, resultRelInfo=<optimized out>) at postgres_fdw.c:2208
#7 0x000055f7f5114bb4 in ExecInitRoutingInfo (mtstate=mtstate@entry=0x55f7f710a508, estate=estate@entry=0x55f7f71a7d50, proute=proute@entry=0x55f7f710a720, dispatch=dispatch@entry=0x55f7f710a778,
partRelInfo=partRelInfo@entry=0x55f7f710eb20, partidx=partidx@entry=0) at execPartition.c:1004
#8 0x000055f7f511618d in ExecInitPartitionInfo (partidx=0, rootResultRelInfo=0x55f7f710a278, dispatch=0x55f7f710a778, proute=0x55f7f710a720, estate=0x55f7f71a7d50, mtstate=0x55f7f710a508) at execPartition.c:742
#9 ExecFindPartition () at execPartition.c:400
#10 0x000055f7f50a2718 in CopyFrom () at copyfrom.c:857
#11 0x000055f7f50a1b06 in DoCopy () at copy.c:299

(gdb) up
#7 0x000055f7f5114bb4 in ExecInitRoutingInfo (mtstate=mtstate@entry=0x55f7f710a508, estate=estate@entry=0x55f7f71a7d50, proute=proute@entry=0x55f7f710a720, dispatch=dispatch@entry=0x55f7f710a778,
partRelInfo=partRelInfo@entry=0x55f7f710eb20, partidx=partidx@entry=0) at execPartition.c:1004
1004 partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
(gdb) p partRelInfo->ri_RangeTableIndex
$7 = 0
(gdb) p *estate->es_range_table
$9 = {type = T_List, length = 1, max_length = 5, elements = 0x55f7f717a2c0, initial_elements = 0x55f7f717a2c0}

--
Justin

#73tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Justin Pryzby (#72)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Justin Pryzby <pryzby@telsasoft.com>

This is crashing during fdw check.
http://cfbot.cputube.org/andrey-lepikhov.html

Maybe it's related to this patch:
|commit 6214e2b2280462cbc3aa1986e350e167651b3905
| Fix permission checks on constraint violation errors on partitions.
| Security: CVE-2021-3393

Thank you for your kind detailed investigation. The rebased version is attached.

Regards
Takayuki Tsunakawa

Attachments:

v15-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v15-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 889235087215aeaac27a7d1dde13f7b299f55cd7 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v15] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized.  That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo structure.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  60 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 143 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 +++++
 doc/src/sgml/fdwhandler.sgml                   |  73 +++++++
 src/backend/commands/copy.c                    |   4 +-
 src/backend/commands/copyfrom.c                | 269 +++++++++++--------------
 src/backend/commands/copyto.c                  |  84 ++++++--
 src/backend/executor/execMain.c                |  52 +++++
 src/backend/executor/execPartition.c           |  34 +++-
 src/include/commands/copy.h                    |   8 +-
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/execPartition.h           |   2 +
 src/include/executor/executor.h                |   2 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 17 files changed, 649 insertions(+), 207 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..71e0538c 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,20 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2135,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2166,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2176,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2198,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e11..dd310bb 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 368997d..ed872e9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ModifyTableState *mtstate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2169,6 +2181,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	EState	   *estate = mtstate->ps.state;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(mtstate->ps.state,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, NULL, RelationGetDescr(rel), NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 151f4f1..e0d54ef 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 854913a..ecc6273 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1068,6 +1068,79 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(ModifyTableState *mtstate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed; global data about
+     the plan and execution state is available via this structure.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     When this is called by a <command>COPY FROM</command> command, the
+     plan-related global data in <literal>mtstate</literal> is not provided.
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..cd8aa57 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -303,8 +303,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	{
 		CopyToState cstate;
 
-		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+		cstate = BeginCopyTo(pstate, rel, NULL, query, relid,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b..4ee5c70 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
 
-		ExecClearTuple(slots[i]);
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,12 +547,10 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -652,6 +660,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo, NULL);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -668,10 +703,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+		{
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+																  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -700,83 +747,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -784,7 +757,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -827,7 +800,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -835,7 +808,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -894,24 +866,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -941,7 +903,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -960,9 +922,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1034,7 +993,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1112,11 +1071,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1142,14 +1098,19 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert &&
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+														target_resultRelInfo);
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e04ec1e..7a10c9d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -52,6 +52,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -82,11 +83,14 @@ typedef struct CopyToStateData
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to */
+	TupleDesc	tupDesc;		/* COPY TO will be used for manual tuple copying
+								  * into the destination */
 	QueryDesc  *queryDesc;		/* executable query to copy from */
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -117,7 +121,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -289,6 +292,14 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 	}
 
 	/* Update the progress */
@@ -382,19 +393,24 @@ EndCopy(CopyToState cstate)
 CopyToState
 BeginCopyTo(ParseState *pstate,
 			Relation rel,
+			TupleDesc srcTupDesc,
 			RawStmt *raw_query,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
+	/* Impossible to mix CopyTo modes */
+	Assert(rel == NULL || srcTupDesc == NULL);
+
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
@@ -459,6 +475,11 @@ BeginCopyTo(ParseState *pstate,
 
 		tupDesc = RelationGetDescr(cstate->rel);
 	}
+	else if (srcTupDesc)
+	{
+		Assert(!raw_query);
+		tupDesc = cstate->tupDesc = srcTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -704,6 +725,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -786,7 +812,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -795,7 +821,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -835,18 +863,22 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
+	else if (cstate->tupDesc)
+		tupDesc = cstate->tupDesc;
 	else
 		tupDesc = cstate->queryDesc->tupDesc;
 	num_phys_attrs = tupDesc->natts;
@@ -933,6 +965,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -967,23 +1025,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c74ce36..744ffe7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,62 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri,
+							  const ResultRelInfo *partition_root)
+{
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (partition_root && !partition_root->ri_usesMultiInsert)
+		return false;
+
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b9e4f2d..76dc31f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * Use multi-insert mode if the condition checking passes for the
+	 * parent and its child.
+	 */
+	leaf_part_rri->ri_usesMultiInsert =
+		ExecMultiInsertAllowed(leaf_part_rri, rootResultRelInfo);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +996,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			Assert(partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL);
+			partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1210,10 +1224,16 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert &&
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+															   resultRelInfo);
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..a7e7224 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -78,12 +79,17 @@ extern DestReceiver *CreateCopyDestReceiver(void);
 /*
  * internal prototypes
  */
-extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
+extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel,
+							   TupleDesc tupDesc, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37942d..c527610 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -41,16 +41,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde..b5f73d8 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -118,6 +118,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..fe3aab5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,8 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri,
+										  const ResultRelInfo *partition_root);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..676a1b7 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b6a88ff..4ec5b34 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -508,7 +508,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#74Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#73)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Tsunakawa-san, Andrey,

On Mon, Feb 15, 2021 at 1:54 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Justin Pryzby <pryzby@telsasoft.com>

This is crashing during fdw check.
http://cfbot.cputube.org/andrey-lepikhov.html

Maybe it's related to this patch:
|commit 6214e2b2280462cbc3aa1986e350e167651b3905
| Fix permission checks on constraint violation errors on partitions.
| Security: CVE-2021-3393

Thank you for your kind detailed investigation. The rebased version is attached.

Thanks for updating the patch.

The commit message says this:

Move that decision logic into InitResultRelInfo which now sets a new
boolean field ri_usesMultiInsert of ResultRelInfo when a target
relation is first initialized. That prevents repeated computation
of the same information in some cases, especially for partitions,
and the new arrangement results in slightly more readability.
Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo structure.

However, it is no longer InitResultRelInfo() that sets
ri_usesMultiInsert. Doing that is now left for concerned functions
who set it when they have enough information to do that correctly.
Maybe update the message to make that clear to interested readers.

+   /*
+    * Use multi-insert mode if the condition checking passes for the
+    * parent and its child.
+    */
+   leaf_part_rri->ri_usesMultiInsert =
+       ExecMultiInsertAllowed(leaf_part_rri, rootResultRelInfo);

Think I have mentioned upthread that this looks better as:

if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert = ExecMultiInsertAllowed(leaf_part_rri);

This keeps the logic confined to ExecInitPartitionInfo() where it
belongs. No point in burdening other callers of
ExecMultiInsertAllowed() in deciding whether or not it should pass a
valid value for the 2nd parameter.

+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+                          ResultRelInfo *resultRelInfo)
+{
...
+   if (resultRelInfo->ri_RangeTableIndex == 0)
+   {
+       ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+       rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);

It's better to add an Assert(rootResultRelInfo != NULL) here.
Apparently, there are cases where ri_RangeTableIndex == 0 without
ri_RootResultRelInfo being set. The Assert will ensure that
BeginForeignCopy() is not mistakenly called on such ResultRelInfos.

+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+   appendStringInfoString(buf, "COPY ");
+   deparseRelation(buf, rel);
+   (void) deparseRelColumnList(buf, rel, true);
+
+   appendStringInfoString(buf, " FROM STDIN ");
+}

I can't parse what the function's comment says about "using list of
parameters". Maybe it means to say "list of columns" specified in the
COPY FROM statement. How about writing this as:

/*
* Deparse remote COPY FROM statement
*
* Note that this explicitly specifies the list of COPY's target columns
* to account for the fact that the remote table's columns may not match
* exactly with the columns declared in the local definition.
*/

I'm hoping that I'm interpreting the original note correctly. Andrey?

+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed;
global data about
+     the plan and execution state is available via this structure.
...
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+                                          ResultRelInfo *rinfo);

Maybe a bit late realizing this, but why does BeginForeignCopy()
accept a ModifyTableState pointer whereas maybe just an EState pointer
will do? I can't imagine why an FDW would want to look at the
ModifyTableState. Case in point, I see that
postgresBeginForeignCopy() only uses the EState from the
ModifyTableState passed to it. I think the ResultRelInfo that's being
passed to the Copy APIs contains most of the necessary information.
Also, EndForeignCopy() seems fine with just receiving the EState.

+   TupleDesc   tupDesc;        /* COPY TO will be used for manual tuple copying
+                                 * into the destination */
...
@@ -382,19 +393,24 @@ EndCopy(CopyToState cstate)
 CopyToState
 BeginCopyTo(ParseState *pstate,
            Relation rel,
+           TupleDesc srcTupDesc,

I think that either the commentary around tupDesc/srcTupDesc needs to
be improved or we should really find a way to do this without
maintaining TupleDesc separately from the CopyState.rel. IIUC, this
change is merely to allow postgres_fdw's ExecForeignCopy() to use
CopyOneRowTo() which needs to be passed a valid CopyState.
postgresBeginForeignCopy() initializes one by calling BeginCopyTo(),
but it can't just pass the target foreign Relation to it, because
generic BeginCopyTo() has this:

if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
{
...
else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot copy from foreign table \"%s\"",
RelationGetRelationName(rel)),
errhint("Try the COPY (SELECT ...) TO variant.")));

If the intention is to only prevent this error, maybe the condition
above could be changed as this:

/*
* Check whether we support copying data out of the specified relation,
* unless the caller also passed a non-NULL data_dest_cb, in which case,
* the callback will take care of it
*/
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
data_dest_cb == NULL)

I just checked that this works or at least doesn't break any newly added tests.

--
Amit Langote
EDB: http://www.enterprisedb.com

#75tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#74)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Amit Langote <amitlangote09@gmail.com>

Think I have mentioned upthread that this looks better as:

if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert = ExecMultiInsertAllowed(leaf_part_rri);

This keeps the logic confined to ExecInitPartitionInfo() where it
belongs. No point in burdening other callers of
ExecMultiInsertAllowed() in deciding whether or not it should pass a
valid value for the 2nd parameter.

Oh, that's a good idea. (Why didn't I think of such a simple idea?)

Maybe a bit late realizing this, but why does BeginForeignCopy()
accept a ModifyTableState pointer whereas maybe just an EState pointer
will do? I can't imagine why an FDW would want to look at the
ModifyTableState. Case in point, I see that
postgresBeginForeignCopy() only uses the EState from the
ModifyTableState passed to it. I think the ResultRelInfo that's being
passed to the Copy APIs contains most of the necessary information.

You're right. COPY is not under the control of a ModifyTable plan, so it's strange to pass ModifyTableState.

Also, EndForeignCopy() seems fine with just receiving the EState.

I think this can have the ResultRelInfo like EndForeignInsert() and EndForeignModify() to correspond to the Begin function: "begin/end COPYing into this relation."

/*
* Check whether we support copying data out of the specified relation,
* unless the caller also passed a non-NULL data_dest_cb, in which case,
* the callback will take care of it
*/
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
data_dest_cb == NULL)

I just checked that this works or at least doesn't break any newly added tests.

Good idea, too. The code has become more readable.

Thank you a lot. Your other comments that are not mentioned above are also reflected. The attached patch passes the postgres_fdw regression test.

Regards
Takayuki Tsunakawa

Attachments:

v16-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v16-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 549db74b659606df4c6090f44a4fac4e6ba8fca4 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v16] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  63 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 143 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 ++++
 doc/src/sgml/fdwhandler.sgml                   |  66 ++++++
 src/backend/commands/copy.c                    |   2 +-
 src/backend/commands/copyfrom.c                | 271 +++++++++++--------------
 src/backend/commands/copyto.c                  |  79 +++++--
 src/backend/executor/execMain.c                |  44 ++++
 src/backend/executor/execPartition.c           |  37 +++-
 src/include/commands/copy.h                    |   5 +
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/execPartition.h           |   2 +
 src/include/executor/executor.h                |   1 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 17 files changed, 632 insertions(+), 206 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..7e10f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2138,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2169,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2179,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2201,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e11..dd310bb 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 368997d..7f33568 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2169,6 +2181,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin an COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we've pumped libpq back to idle state */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish an COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 151f4f1..e0d54ef 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 854913a..8966b84 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1068,6 +1068,72 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing an copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a bulk of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..411c409 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b..7b05da7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,12 +547,10 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -652,6 +660,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -668,10 +703,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -700,83 +747,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -784,7 +757,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -827,7 +800,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -835,7 +808,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -894,24 +866,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -941,7 +903,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -960,9 +922,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1034,7 +993,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1112,11 +1071,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1142,14 +1098,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e04ec1e..03c9df5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -52,6 +52,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -87,6 +88,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -117,7 +119,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -289,6 +290,14 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
 	}
 
 	/* Update the progress */
@@ -386,16 +395,23 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -704,6 +720,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -786,7 +807,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -795,7 +816,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -835,15 +858,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separated to the routine to prevent duplicate operations in the case of
+ * manual mode, where tuples are copied to the destination one by one, by call of
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -933,6 +958,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -967,23 +1018,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c74ce36..6dc25b7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b9e4f2d..68fff3e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1210,10 +1225,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..3d9d187 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37942d..c527610 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -41,16 +41,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde..b5f73d8 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -118,6 +118,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..754a9f5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..aeb8484 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b6a88ff..4ec5b34 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -508,7 +508,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#76Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Amit Langote (#74)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 2/15/21 1:31 PM, Amit Langote wrote:

Tsunakawa-san, Andrey,
+static void
+postgresBeginForeignCopy(ModifyTableState *mtstate,
+                          ResultRelInfo *resultRelInfo)
+{
...
+   if (resultRelInfo->ri_RangeTableIndex == 0)
+   {
+       ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+       rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);

It's better to add an Assert(rootResultRelInfo != NULL) here.
Apparently, there are cases where ri_RangeTableIndex == 0 without
ri_RootResultRelInfo being set. The Assert will ensure that
BeginForeignCopy() is not mistakenly called on such ResultRelInfos.

+1

I can't parse what the function's comment says about "using list of
parameters". Maybe it means to say "list of columns" specified in the
COPY FROM statement. How about writing this as:

/*
* Deparse remote COPY FROM statement
*
* Note that this explicitly specifies the list of COPY's target columns
* to account for the fact that the remote table's columns may not match
* exactly with the columns declared in the local definition.
*/

I'm hoping that I'm interpreting the original note correctly. Andrey?

Yes, this is a good option.

+    <para>
+     <literal>mtstate</literal> is the overall state of the
+     <structname>ModifyTable</structname> plan node being executed;
global data about
+     the plan and execution state is available via this structure.
...
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+                                          ResultRelInfo *rinfo);

Maybe a bit late realizing this, but why does BeginForeignCopy()
accept a ModifyTableState pointer whereas maybe just an EState pointer
will do? I can't imagine why an FDW would want to look at the
ModifyTableState. Case in point, I see that
postgresBeginForeignCopy() only uses the EState from the
ModifyTableState passed to it. I think the ResultRelInfo that's being
passed to the Copy APIs contains most of the necessary information.
Also, EndForeignCopy() seems fine with just receiving the EState.

+1

If the intention is to only prevent this error, maybe the condition
above could be changed as this:

/*
* Check whether we support copying data out of the specified relation,
* unless the caller also passed a non-NULL data_dest_cb, in which case,
* the callback will take care of it
*/
if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
data_dest_cb == NULL)

Agreed. This is an atavism. In the first versions, I did not use the
data_dest_cb routine. But now this is a redundant parameter.

--
regards,
Andrey Lepikhov
Postgres Professional

#77Justin Pryzby
pryzby@telsasoft.com
In reply to: tsunakawa.takay@fujitsu.com (#75)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Find attached some language fixes.

|/* Do this to ensure we've pumped libpq back to idle state */

I don't know why you mean by "pumped"?

The CopySendEndOfRow "case COPY_CALLBACK:" should have a "break;"

This touches some of the same parts as my "bulk insert" patch:
https://commitfest.postgresql.org/32/2553/

--
Justin

Attachments:

0001-language-fixen.pxtchtext/x-diff; charset=us-asciiDownload
From f7bb368963f5808bc5126f179b78507ca52b9cd2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 24 Feb 2021 02:23:17 -0600
Subject: [PATCH] language fixen

---
 contrib/postgres_fdw/postgres_fdw.c |  4 ++--
 doc/src/sgml/fdwhandler.sgml        | 13 +++++++------
 src/backend/commands/copyfrom.c     |  2 +-
 src/backend/commands/copyto.c       |  4 ++--
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b55f19b193..a3c394360e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2203,7 +2203,7 @@ pgfdw_copy_dest_cb(void *buf, int len)
 
 /*
  * postgresBeginForeignCopy
- *		Begin an COPY operation on a foreign table
+ *		Begin a COPY operation on a foreign table
  */
 static void
 postgresBeginForeignCopy(EState *estate,
@@ -2306,7 +2306,7 @@ postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
 
 /*
  * postgresEndForeignCopy
- *		Finish an COPY operation on a foreign table
+ *		Finish a COPY operation on a foreign table
  */
 static void
 postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index a49c17251f..666148aeb3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -813,8 +813,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1072,12 +1073,12 @@ BeginForeignCopy(EState *estate,
                    ResultRelInfo *rinfo);
 </programlisting>
 
-     Begin executing an copy operation on a foreign table. This routine is
+     Begin executing a copy operation on a foreign table. This routine is
      called right before the first call of <function>ExecForeignCopy</function>
      routine for the foreign table. It should perform any initialization needed
      prior to the actual COPY FROM operation.
      Subsequently, <function>ExecForeignCopy</function> will be called for
-     a bulk of tuples to be copied into the foreign table.
+     a batch of tuples to be copied into the foreign table.
     </para>
 
     <para>
@@ -1101,12 +1102,12 @@ ExecForeignCopy(ResultRelInfo *rinfo,
                   int nslots);
 </programlisting>
 
-     Copy a bulk of tuples into the foreign table.
+     Copy a batch of tuples into the foreign table.
       <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
      the target foreign table.
      <literal>slots</literal> contains the tuples to be inserted; it will match the
      row-type definition of the foreign table.
-     <literal>nslots</literal> is a number of tuples in the <literal>slots</literal>
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
     </para>
 
     <para>
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 7b05da7871..b7c912cc3f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -660,7 +660,7 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
-	Assert(target_resultRelInfo->ri_usesMultiInsert == false);
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
 
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 03c9df5084..2ac1e27acd 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -859,8 +859,8 @@ EndCopyTo(CopyToState cstate)
 
 /*
  * Start COPY TO operation.
- * Separated to the routine to prevent duplicate operations in the case of
- * manual mode, where tuples are copied to the destination one by one, by call of
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
  * the CopyOneRowTo() routine.
  */
 void
-- 
2.17.0

#78tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Justin Pryzby (#77)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Justin Pryzby <pryzby@telsasoft.com>

Find attached some language fixes.

Thanks a lot! (I wish there will be some tool like "pgEnglish" that corrects English in code comments and docs.)

|/* Do this to ensure we've pumped libpq back to idle state */

I don't know why you mean by "pumped"?

I changed it to "have not gotten extra results" to match the error message.

The CopySendEndOfRow "case COPY_CALLBACK:" should have a "break;"

Added.

This touches some of the same parts as my "bulk insert" patch:
https://commitfest.postgresql.org/32/2553/

My colleague will be reviewing it.

Regards
Takayuki Tsunakawa

Attachments:

v17-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v17-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From bb2cb9d2fe0e1b790bb36e548105e90652d73adf Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v17] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  63 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 143 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 ++++
 doc/src/sgml/fdwhandler.sgml                   |  71 ++++++-
 src/backend/commands/copy.c                    |   2 +-
 src/backend/commands/copyfrom.c                | 271 +++++++++++--------------
 src/backend/commands/copyto.c                  |  80 ++++++--
 src/backend/executor/execMain.c                |  44 ++++
 src/backend/executor/execPartition.c           |  37 +++-
 src/include/commands/copy.h                    |   5 +
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/execPartition.h           |   2 +
 src/include/executor/executor.h                |   1 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 17 files changed, 636 insertions(+), 208 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..7e10f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2138,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2169,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2179,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2201,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b..5b2d03a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b4857..db80dec 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2178,6 +2190,137 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin a COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||
+			(OK && PQresultStatus(res) != PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we have not gotten extra results */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish a COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea..02efe2f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052..666148a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -813,8 +813,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1067,6 +1068,72 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing a copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a batch of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a batch of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..411c409 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b..b7c912c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,12 +547,10 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -652,6 +660,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -668,10 +703,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -700,83 +747,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -784,7 +757,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -827,7 +800,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -835,7 +808,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -894,24 +866,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -941,7 +903,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -960,9 +922,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1034,7 +993,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1112,11 +1071,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1142,14 +1098,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e04ec1e..bcd5d87 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -52,6 +52,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -87,6 +88,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -117,7 +119,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -289,6 +290,15 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
+			break;
 	}
 
 	/* Update the progress */
@@ -386,16 +396,23 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -704,6 +721,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -786,7 +808,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -795,7 +817,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -835,15 +859,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -933,6 +959,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -967,23 +1019,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c74ce36..6dc25b7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5..13aef41 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1211,10 +1226,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..3d9d187 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37942d..c527610 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -41,16 +41,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde..b5f73d8 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -118,6 +118,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..754a9f5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..aeb8484 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad62..f32dcf6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -511,7 +511,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#79Zhihong Yu
zyu@yugabyte.com
In reply to: tsunakawa.takay@fujitsu.com (#78)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi,

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible

'is possible' -> 'if possible'

FDWAPI was extended by next routines:

next routines -> the following routines

For postgresExecForeignCopy():

+ if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||

Is PGRES_FATAL_ERROR handled somewhere else ? I don't seem to find that in
the patch.

Cheers

On Wed, Mar 3, 2021 at 6:24 PM tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:

Show quoted text

From: Justin Pryzby <pryzby@telsasoft.com>

Find attached some language fixes.

Thanks a lot! (I wish there will be some tool like "pgEnglish" that
corrects English in code comments and docs.)

|/* Do this to ensure we've pumped libpq back to idle state */

I don't know why you mean by "pumped"?

I changed it to "have not gotten extra results" to match the error message.

The CopySendEndOfRow "case COPY_CALLBACK:" should have a "break;"

Added.

This touches some of the same parts as my "bulk insert" patch:
https://commitfest.postgresql.org/32/2553/

My colleague will be reviewing it.

Regards
Takayuki Tsunakawa

#80tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Zhihong Yu (#79)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Zhihong Yu <zyu@yugabyte.com>

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible

'is possible' -> 'if possible'

FDWAPI was extended by next routines:

next routines -> the following routines

Thank you, fixed slightly differently. (I feel the need for pgEnglish again.)

+ if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||

Is PGRES_FATAL_ERROR handled somewhere else ? I don't seem to find that in the patch.

Good catch. ok doesn't need to be consulted here, because failure during row transmission causes PQputCopyEnd() to receive non-NULL for its second argument, which in turn makes PQgetResult() return non-COMMAND_OK.

Regards
Takayuki Tsunakawa

Attachments:

v18-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v18-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 4441cf9dffe4dd5a1a7f495f5b16a4ab655f7979 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v18] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table when multi-insert
is possible and foreign table has non-zero number of columns.

The following routines are added to the FDW interface:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine runs
'COPY ... FROM STDIN' command to the foreign server, in an iterative
manner to send tuples using the CopyTo() machinery.

Code that constructs a list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is split into deparseRelColumnList().
It is reused in deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine is split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  63 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 142 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 ++++
 doc/src/sgml/fdwhandler.sgml                   |  71 ++++++-
 src/backend/commands/copy.c                    |   2 +-
 src/backend/commands/copyfrom.c                | 271 +++++++++++--------------
 src/backend/commands/copyto.c                  |  80 ++++++--
 src/backend/executor/execMain.c                |  44 ++++
 src/backend/executor/execPartition.c           |  37 +++-
 src/include/commands/copy.h                    |   5 +
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/execPartition.h           |   2 +
 src/include/executor/executor.h                |   1 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 17 files changed, 635 insertions(+), 208 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..7e10f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2138,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2169,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2179,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2201,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b..5b2d03a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b4857..b45e001 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2178,6 +2190,136 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin a COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0 ||
+			PQflush(conn))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we have not gotten extra results */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish a COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea..02efe2f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052..666148a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -813,8 +813,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1067,6 +1068,72 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing a copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a batch of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a batch of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..411c409 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b..b7c912c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,12 +547,10 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -652,6 +660,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -668,10 +703,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -700,83 +747,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -784,7 +757,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -827,7 +800,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -835,7 +808,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -894,24 +866,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -941,7 +903,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -960,9 +922,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1034,7 +993,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1112,11 +1071,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1142,14 +1098,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e04ec1e..bcd5d87 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -52,6 +52,7 @@ typedef enum CopyDest
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_OLD_FE,				/* to frontend (2.0 protocol) */
 	COPY_NEW_FE,				/* to frontend (3.0 protocol) */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -87,6 +88,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -117,7 +119,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static uint64 CopyTo(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -289,6 +290,15 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
+			break;
 	}
 
 	/* Update the progress */
@@ -386,16 +396,23 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -704,6 +721,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -786,7 +808,7 @@ BeginCopyTo(ParseState *pstate,
 uint64
 DoCopyTo(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
 	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	uint64		processed;
 
@@ -795,7 +817,9 @@ DoCopyTo(CopyToState cstate)
 		if (fe_copy)
 			SendCopyBegin(cstate);
 
+		CopyToStart(cstate);
 		processed = CopyTo(cstate);
+		CopyToFinish(cstate);
 
 		if (fe_copy)
 			SendCopyEnd(cstate);
@@ -835,15 +859,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
+ * the CopyOneRowTo() routine.
  */
-static uint64
-CopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -933,6 +959,32 @@ CopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+static uint64
+CopyTo(CopyToState cstate)
+{
+	uint64		processed;
 
 	if (cstate->rel)
 	{
@@ -967,23 +1019,13 @@ CopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
-
 	return processed;
 }
 
 /*
  * Emit one row during CopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c74ce36..6dc25b7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5..13aef41 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1211,10 +1226,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..3d9d187 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37942d..c527610 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -41,16 +41,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde..b5f73d8 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -118,6 +118,8 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										PartitionTupleRouting *proute,
 										TupleTableSlot *slot,
 										EState *estate);
+extern bool checkMultiInsertMode(const ResultRelInfo *rri,
+								 const ResultRelInfo *parent);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..754a9f5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..aeb8484 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad62..f32dcf6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -511,7 +511,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#81Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#80)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Thu, Mar 4, 2021 at 12:40 PM tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:

From: Zhihong Yu <zyu@yugabyte.com>

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible

'is possible' -> 'if possible'

FDWAPI was extended by next routines:

next routines -> the following routines

Thank you, fixed slightly differently. (I feel the need for pgEnglish
again.)

+ if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||

Is PGRES_FATAL_ERROR handled somewhere else ? I don't seem to find that

in the patch.

Good catch. ok doesn't need to be consulted here, because failure during
row transmission causes PQputCopyEnd() to receive non-NULL for its second
argument, which in turn makes PQgetResult() return non-COMMAND_OK.

Regards
Takayuki Tsunakawa

This patch set no longer applies
http://cfbot.cputube.org/patch_32_2601.log

Can we get a rebase?

I am marking the patch "Waiting on Author"

--
Ibrar Ahmed

#82Justin Pryzby
pryzby@telsasoft.com
In reply to: tsunakawa.takay@fujitsu.com (#80)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

I think this change to the regression tests is suspicous:

-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1 xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1       xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2

I think it shouldn't say "COPY rem2, line 2" but rather a remote version of the
same:
|COPY loc2, line 1: "-1 xyzzy"

I have rebased this on my side over yesterday's libpq changes - I'll send it if
you want, but it's probably just as easy if you do it.

--
Justin

#83tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Justin Pryzby (#82)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Justin Pryzby <pryzby@telsasoft.com>

I think this change to the regression tests is suspicous:

-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES
($1, $2)
-COPY rem2, line 1: "-1 xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1       xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN
+COPY rem2, line 2

I think it shouldn't say "COPY rem2, line 2" but rather a remote version of the
same:
|COPY loc2, line 1: "-1 xyzzy"

No, the output is OK. The remote message is included as the first line of the CONTEXT message field. The last line of the CONTEXT field is something that was added by the local COPY command. (Anyway, useful enough information is included in the message -- the constraint violation and the data that caused it.)

I have rebased this on my side over yesterday's libpq changes - I'll send it if
you want, but it's probably just as easy if you do it.

I've managed to rebased it, although it took unexpectedly long. The patch is attached. It passes make check against core and postgres_fdw. I'll turn the CF status back to ready for committer shortly.

Regards
Takayuki Tsunakawa

Attachments:

v19-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v19-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 98c02fe0f4ed5ec0f5903612b5b9aca3e44c4ece Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v19] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table when multi-insert
is possible and foreign table has non-zero number of columns.

The following routines are added to the FDW interface:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine runs
'COPY ... FROM STDIN' command to the foreign server, in an iterative
manner to send tuples using the CopyTo() machinery.

Code that constructs a list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is split into deparseRelColumnList().
It is reused in deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine is split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  63 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 141 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 ++++
 doc/src/sgml/fdwhandler.sgml                   |  71 ++++++-
 src/backend/commands/copy.c                    |   2 +-
 src/backend/commands/copyfrom.c                | 271 +++++++++++--------------
 src/backend/commands/copyto.c                  |  88 ++++++--
 src/backend/executor/execMain.c                |  44 ++++
 src/backend/executor/execPartition.c           |  37 +++-
 src/include/commands/copy.h                    |   5 +
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/executor.h                |   1 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 16 files changed, 637 insertions(+), 211 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..7e10f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2138,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2169,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2179,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2201,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b..5b2d03a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b4857..98fe339 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2178,6 +2190,135 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin a COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0)
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we have not gotten extra results */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish a COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea..02efe2f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052..666148a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -813,8 +813,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1067,6 +1068,72 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing a copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a batch of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a batch of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..411c409 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f05e2d2..af76294 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,12 +547,10 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -652,6 +660,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -668,10 +703,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -700,83 +747,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -784,7 +757,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -827,7 +800,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -835,7 +808,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -894,24 +866,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -941,7 +903,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -960,9 +922,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1034,7 +993,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1112,11 +1071,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1135,14 +1091,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4615501..9ff2b6e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -51,6 +51,7 @@ typedef enum CopyDest
 {
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_FRONTEND,				/* to frontend */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -86,6 +87,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -115,7 +117,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -248,6 +249,15 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
+			break;
 	}
 
 	/* Update the progress */
@@ -345,16 +355,23 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -663,6 +680,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -758,20 +780,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
+ * the CopyOneRowTo() routine.
  */
-uint64
-DoCopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
-	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
-
-	if (fe_copy)
-		SendCopyBegin(cstate);
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -861,6 +880,39 @@ DoCopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+uint64
+DoCopyTo(CopyToState cstate)
+{
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
+	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
+	uint64		processed;
+
+	if (fe_copy)
+		SendCopyBegin(cstate);
+
+	CopyToStart(cstate);
 
 	if (cstate->rel)
 	{
@@ -895,15 +947,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
+	CopyToFinish(cstate);
 
 	if (fe_copy)
 		SendCopyEnd(cstate);
@@ -914,7 +958,7 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c74ce36..6dc25b7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5..13aef41 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1211,10 +1226,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..3d9d187 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 705f5b6..c23f631 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -40,16 +40,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..754a9f5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..aeb8484 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad62..f32dcf6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -511,7 +511,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#84tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Justin Pryzby (#72)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Justin Pryzby <pryzby@telsasoft.com>

Could you rebase again and send an updated patch ?
I could do it if you want.

Rebased and attached. Fortunately, there was no rebase conflict this time. make check passed for PG core and postgres_fdw.

Regards
Takayuki Tsunakawa

Attachments:

v20-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v20-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 197de772c13d6f2ec75948bb2454494e2d936845 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v20] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table when multi-insert
is possible and foreign table has non-zero number of columns.

The following routines are added to the FDW interface:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine runs
'COPY ... FROM STDIN' command to the foreign server, in an iterative
manner to send tuples using the CopyTo() machinery.

Code that constructs a list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is split into deparseRelColumnList().
It is reused in deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine is split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  63 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 141 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 ++++
 doc/src/sgml/fdwhandler.sgml                   |  71 ++++++-
 src/backend/commands/copy.c                    |   2 +-
 src/backend/commands/copyfrom.c                | 271 +++++++++++--------------
 src/backend/commands/copyto.c                  |  88 ++++++--
 src/backend/executor/execMain.c                |  44 ++++
 src/backend/executor/execPartition.c           |  37 +++-
 src/include/commands/copy.h                    |   5 +
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/executor.h                |   1 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 16 files changed, 637 insertions(+), 211 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..7e10f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2138,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2169,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2179,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2201,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b..5b2d03a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b4857..98fe339 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2178,6 +2190,135 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin a COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0)
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we have not gotten extra results */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish a COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea..02efe2f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052..666148a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -813,8 +813,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1067,6 +1068,72 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing a copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a batch of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a batch of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..411c409 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2ed696d..7a31633 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,13 +547,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -653,6 +661,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -669,10 +704,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -701,83 +748,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -785,7 +758,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -828,7 +801,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -836,7 +809,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -903,24 +875,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -950,7 +912,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -969,9 +931,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1043,7 +1002,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1122,11 +1081,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1145,14 +1101,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 7257a54..3782336 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -51,6 +51,7 @@ typedef enum CopyDest
 {
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_FRONTEND,				/* to frontend */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -86,6 +87,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -115,7 +117,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -248,6 +249,15 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
+			break;
 	}
 
 	/* Update the progress */
@@ -345,11 +355,12 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
@@ -362,7 +373,13 @@ BeginCopyTo(ParseState *pstate,
 		0
 	};
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -673,6 +690,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -773,20 +795,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
+ * the CopyOneRowTo() routine.
  */
-uint64
-DoCopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
-	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
-
-	if (fe_copy)
-		SendCopyBegin(cstate);
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -876,6 +895,39 @@ DoCopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+uint64
+DoCopyTo(CopyToState cstate)
+{
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
+	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
+	uint64		processed;
+
+	if (fe_copy)
+		SendCopyBegin(cstate);
+
+	CopyToStart(cstate);
 
 	if (cstate->rel)
 	{
@@ -914,15 +966,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
+	CopyToFinish(cstate);
 
 	if (fe_copy)
 		SendCopyEnd(cstate);
@@ -933,7 +977,7 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0648dd8..c5ed67a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5..13aef41 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1211,10 +1226,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..3d9d187 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 705f5b6..c23f631 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -40,16 +40,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..754a9f5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..aeb8484 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad62..f32dcf6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -511,7 +511,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#85Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#83)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On 5/3/21 21:54, tsunakawa.takay@fujitsu.com wrote:

I've managed to rebased it, although it took unexpectedly long. The patch is attached. It passes make check against core and postgres_fdw. I'll turn the CF status back to ready for committer shortly.

Macros _() at the postgresExecForeignCopy routine:
if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0)

uses gettext. Under linux it is compiled ok, because (as i understood)
uses standard implementation of gettext:
objdump -t contrib/postgres_fdw/postgres_fdw.so | grep 'gettext'
gettext@@GLIBC_2.2.5

but in MacOS (and maybe somewhere else) we need to explicitly link
libintl library in the Makefile:
SHLIB_LINK += $(filter -lintl, $(LIBS)

Also, we may not use gettext at all in this part of the code.

--
regards,
Andrey Lepikhov
Postgres Professional

#86tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey Lepikhov (#85)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>

Macros _() at the postgresExecForeignCopy routine:
if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0)

uses gettext. Under linux it is compiled ok, because (as i understood)
uses standard implementation of gettext:
objdump -t contrib/postgres_fdw/postgres_fdw.so | grep 'gettext'
gettext@@GLIBC_2.2.5

but in MacOS (and maybe somewhere else) we need to explicitly link
libintl library in the Makefile:
SHLIB_LINK += $(filter -lintl, $(LIBS)

Also, we may not use gettext at all in this part of the code.

I'm afraid so, because no extension in contrib/ has po/ directory. I just removed _() and rebased the patch on HEAD.

Regards
Takayuki Tsunakawa

Attachments:

v21-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v21-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 8e660acfacb7d9866d2bcf28bc0b4d367ad3ce14 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v21] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table when multi-insert
is possible and foreign table has non-zero number of columns.

The following routines are added to the FDW interface:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine runs
'COPY ... FROM STDIN' command to the foreign server, in an iterative
manner to send tuples using the CopyTo() machinery.

Code that constructs a list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is split into deparseRelColumnList().
It is reused in deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine is split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  63 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 141 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 ++++
 doc/src/sgml/fdwhandler.sgml                   |  71 ++++++-
 src/backend/commands/copy.c                    |   2 +-
 src/backend/commands/copyfrom.c                | 271 +++++++++++--------------
 src/backend/commands/copyto.c                  |  88 ++++++--
 src/backend/executor/execMain.c                |  44 ++++
 src/backend/executor/execPartition.c           |  37 +++-
 src/include/commands/copy.h                    |   5 +
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/executor.h                |   1 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 16 files changed, 637 insertions(+), 211 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..7e10f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2138,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2169,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2179,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2201,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b..5b2d03a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b4857..237bc5f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2178,6 +2190,135 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin a COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : "canceled by server") <= 0)
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we have not gotten extra results */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish a COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea..02efe2f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052..666148a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -813,8 +813,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1067,6 +1068,72 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing a copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a batch of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a batch of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..411c409 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2ed696d..7a31633 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,13 +547,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -653,6 +661,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -669,10 +704,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -701,83 +748,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -785,7 +758,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -828,7 +801,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -836,7 +809,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -903,24 +875,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -950,7 +912,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -969,9 +931,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1043,7 +1002,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1122,11 +1081,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1145,14 +1101,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 7257a54..3782336 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -51,6 +51,7 @@ typedef enum CopyDest
 {
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_FRONTEND,				/* to frontend */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -86,6 +87,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -115,7 +117,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -248,6 +249,15 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
+			break;
 	}
 
 	/* Update the progress */
@@ -345,11 +355,12 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
@@ -362,7 +373,13 @@ BeginCopyTo(ParseState *pstate,
 		0
 	};
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -673,6 +690,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -773,20 +795,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
+ * the CopyOneRowTo() routine.
  */
-uint64
-DoCopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
-	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
-
-	if (fe_copy)
-		SendCopyBegin(cstate);
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -876,6 +895,39 @@ DoCopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+uint64
+DoCopyTo(CopyToState cstate)
+{
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
+	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
+	uint64		processed;
+
+	if (fe_copy)
+		SendCopyBegin(cstate);
+
+	CopyToStart(cstate);
 
 	if (cstate->rel)
 	{
@@ -914,15 +966,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
+	CopyToFinish(cstate);
 
 	if (fe_copy)
 		SendCopyEnd(cstate);
@@ -933,7 +977,7 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0648dd8..c5ed67a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5..13aef41 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1211,10 +1226,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..3d9d187 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 705f5b6..c23f631 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -40,16 +40,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..754a9f5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..aeb8484 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad62..f32dcf6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -511,7 +511,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#87Zhihong Yu
zyu@yugabyte.com
In reply to: tsunakawa.takay@fujitsu.com (#86)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

Hi,
In the description:

with data_dest_cb callback. It is used for send text representation of a
tuple to a custom destination.

send text -> sending text

struct PgFdwModifyState *aux_fmstate; /* foreign-insert state, if
* created */
+ CopyToState cstate; /* foreign COPY state, if used */

Since foreign COPY is optional, should cstate be a pointer ? That would be
in line with aux_fmstate.

Cheers

On Mon, Mar 22, 2021 at 7:02 PM tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:

Show quoted text

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>

Macros _() at the postgresExecForeignCopy routine:
if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0)

uses gettext. Under linux it is compiled ok, because (as i understood)
uses standard implementation of gettext:
objdump -t contrib/postgres_fdw/postgres_fdw.so | grep 'gettext'
gettext@@GLIBC_2.2.5

but in MacOS (and maybe somewhere else) we need to explicitly link
libintl library in the Makefile:
SHLIB_LINK += $(filter -lintl, $(LIBS)

Also, we may not use gettext at all in this part of the code.

I'm afraid so, because no extension in contrib/ has po/ directory. I just
removed _() and rebased the patch on HEAD.

Regards
Takayuki Tsunakawa

#88Justin Pryzby
pryzby@telsasoft.com
In reply to: Zhihong Yu (#87)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Mon, Mar 22, 2021 at 08:18:56PM -0700, Zhihong Yu wrote:

with data_dest_cb callback. It is used for send text representation of a
tuple to a custom destination.

send text -> sending text

I would say "It is used to send the text representation ..."

struct PgFdwModifyState *aux_fmstate; /* foreign-insert state, if
* created */
+ CopyToState cstate; /* foreign COPY state, if used */

Since foreign COPY is optional, should cstate be a pointer ? That would be
in line with aux_fmstate.

It's actually a pointer:
src/include/commands/copy.h:typedef struct CopyToStateData *CopyToState;

There's many data structures like this, where a structure is typedefed with a
"Data" suffix and the pointer is typedefed without the "Data"

--
Justin

#89tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Justin Pryzby (#88)
1 attachment(s)
RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: Justin Pryzby <pryzby@telsasoft.com>

On Mon, Mar 22, 2021 at 08:18:56PM -0700, Zhihong Yu wrote:

with data_dest_cb callback. It is used for send text representation of a
tuple to a custom destination.

send text -> sending text

I would say "It is used to send the text representation ..."

I took Justin-san's suggestion. (It feels like I'm in a junior English class...)

It's actually a pointer:
src/include/commands/copy.h:typedef struct CopyToStateData *CopyToState;

There's many data structures like this, where a structure is typedefed with a
"Data" suffix and the pointer is typedefed without the "Data"

Yes. Thank you for good explanation, Justin-san.

Regards
Takayuki Tsunakawa

Attachments:

v22-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchapplication/octet-stream; name=v22-0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchDownload
From 65bce19e3b826eb74a6e8326c04874a593f00280 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH v22] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table when multi-insert
is possible and foreign table has non-zero number of columns.

The following routines are added to the FDW interface:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine runs
'COPY ... FROM STDIN' command to the foreign server, in an iterative
manner to send tuples using the CopyTo() machinery.

Code that constructs a list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is split into deparseRelColumnList().
It is reused in deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used to send the text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine is split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                 |  63 ++++--
 contrib/postgres_fdw/expected/postgres_fdw.out |  46 ++++-
 contrib/postgres_fdw/postgres_fdw.c            | 141 +++++++++++++
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  45 ++++
 doc/src/sgml/fdwhandler.sgml                   |  71 ++++++-
 src/backend/commands/copy.c                    |   2 +-
 src/backend/commands/copyfrom.c                | 271 +++++++++++--------------
 src/backend/commands/copyto.c                  |  88 ++++++--
 src/backend/executor/execMain.c                |  44 ++++
 src/backend/executor/execPartition.c           |  37 +++-
 src/include/commands/copy.h                    |   5 +
 src/include/commands/copyfrom_internal.h       |  10 -
 src/include/executor/executor.h                |   1 +
 src/include/foreign/fdwapi.h                   |  15 ++
 src/include/nodes/execnodes.h                  |   8 +-
 16 files changed, 637 insertions(+), 211 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499..7e10f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * 'buf' is the output buffer to append the statement to
@@ -2119,6 +2138,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 {
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
+{
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			i;
@@ -2126,10 +2169,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2138,6 +2179,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2157,18 +2201,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b..5b2d03a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8111,8 +8111,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8123,6 +8124,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8231,6 +8245,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b4857..237bc5f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -201,6 +202,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -373,6 +375,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -558,6 +567,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2178,6 +2190,135 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin a COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : "canceled by server") <= 0)
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we have not gotten extra results */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish a COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1f67b4d..cb801c9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -165,6 +165,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea..02efe2f 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2235,6 +2235,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2335,6 +2352,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052..666148a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -813,8 +813,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1067,6 +1068,72 @@ EndDirectModify(ForeignScanState *node);
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing a copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a batch of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a batch of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c712c8..411c409 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2ed696d..7a31633 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -316,54 +316,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -537,13 +547,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -653,6 +661,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -669,10 +704,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -701,83 +748,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -785,7 +758,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -828,7 +801,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -836,7 +809,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -903,24 +875,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -950,7 +912,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -969,9 +931,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1043,7 +1002,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1122,11 +1081,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1145,14 +1101,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 7257a54..3782336 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -51,6 +51,7 @@ typedef enum CopyDest
 {
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_FRONTEND,				/* to frontend */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -86,6 +87,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -115,7 +117,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -248,6 +249,15 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
+			break;
 	}
 
 	/* Update the progress */
@@ -345,11 +355,12 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
@@ -362,7 +373,13 @@ BeginCopyTo(ParseState *pstate,
 		0
 	};
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -673,6 +690,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -773,20 +795,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
+ * the CopyOneRowTo() routine.
  */
-uint64
-DoCopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
-	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
-
-	if (fe_copy)
-		SendCopyBegin(cstate);
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -876,6 +895,39 @@ DoCopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+uint64
+DoCopyTo(CopyToState cstate)
+{
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
+	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
+	uint64		processed;
+
+	if (fe_copy)
+		SendCopyBegin(cstate);
+
+	CopyToStart(cstate);
 
 	if (cstate->rel)
 	{
@@ -914,15 +966,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
+	CopyToFinish(cstate);
 
 	if (fe_copy)
 		SendCopyEnd(cstate);
@@ -933,7 +977,7 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0648dd8..c5ed67a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1233,10 +1233,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
 /*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
+/*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
  *
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5..13aef41 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -589,6 +589,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  estate->es_instrument);
 
 	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
+	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
 	 * required when the operation is CMD_UPDATE.
@@ -989,9 +997,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1211,10 +1226,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Check if this result rel is one belonging to the node's subplans,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e..3d9d187 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 705f5b6..c23f631 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -40,16 +40,6 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-/*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
  *
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363..754a9f5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -193,6 +193,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78d..aeb8484 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -126,6 +126,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -230,6 +240,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad62..f32dcf6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -511,7 +511,13 @@ typedef struct ResultRelInfo
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.10.1

#90Justin Pryzby
pryzby@telsasoft.com
In reply to: tsunakawa.takay@fujitsu.com (#89)
1 attachment(s)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6.

--
Justin

Attachments:

0001-Fast-COPY-FROM-into-the-foreign-or-sharded-table.patchtext/x-diff; charset=us-asciiDownload
From 0987ca4f62fb8c9b43a3fe142d955d8a9cb6f36f Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 9 Feb 2021 12:50:00 +0900
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table when multi-insert
is possible and foreign table has non-zero number of columns.

The following routines are added to the FDW interface:
* BeginForeignCopy
* ExecForeignCopy
* EndForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine runs
'COPY ... FROM STDIN' command to the foreign server, in an iterative
manner to send tuples using the CopyTo() machinery.

Code that constructs a list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is split into deparseRelColumnList().
It is reused in deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used to send the text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. Also for this
reason CopyTo() routine is split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

When 0d5f05cde introduced support for using multi-insert mode when
copying into partitioned tables, it introduced single variable of
enum type CopyInsertMethod shared across all potential target
relations (partitions) that, along with some target relation
properties, dictated whether to engage multi-insert mode for a given
target relation.

Change that decision logic to the combination of ExecMultiInsertAllowed()
and its caller. The former encapsulates the common criteria to allow
multi-insert. The latter uses additional criteria and sets the new
boolean field ri_usesMultiInsert of ResultRelInfo.
That prevents repeated computation of the same information in some cases,
especially for partitions, and the new arrangement results in slightly
more readability.
Enum CopyInsertMethod is removed.

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa
Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
---
 contrib/postgres_fdw/deparse.c                |  63 +++-
 .../postgres_fdw/expected/postgres_fdw.out    |  46 ++-
 contrib/postgres_fdw/postgres_fdw.c           | 141 +++++++++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 +++
 doc/src/sgml/fdwhandler.sgml                  |  71 ++++-
 src/backend/commands/copy.c                   |   2 +-
 src/backend/commands/copyfrom.c               | 271 ++++++++----------
 src/backend/commands/copyto.c                 |  88 ++++--
 src/backend/executor/execMain.c               |  44 +++
 src/backend/executor/execPartition.c          |  37 ++-
 src/include/commands/copy.h                   |   5 +
 src/include/commands/copyfrom_internal.h      |  10 -
 src/include/executor/executor.h               |   1 +
 src/include/foreign/fdwapi.h                  |  15 +
 src/include/nodes/execnodes.h                 |   8 +-
 16 files changed, 637 insertions(+), 211 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bdc4c3620d..bf93c1d091 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -185,6 +185,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 									deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+								  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse remote COPY FROM statement
+ *
+ * Note that this explicitly specifies the list of COPY's target columns
+ * to account for the fact that the remote table's columns may not match
+ * exactly with the columns declared in the local definition.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2120,6 +2139,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2128,10 +2171,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 	List	   *options;
 	ListCell   *lc;
 	bool		first = true;
+	List	   *retrieved_attrs = NIL;
 
-	*retrieved_attrs = NIL;
-
-	appendStringInfoString(buf, "SELECT ");
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		/* Ignore dropped columns. */
@@ -2140,6 +2181,9 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (enclose_in_parens)
+			appendStringInfoChar(buf, '(');
+
 		first = false;
 
 		/* Use attribute name or column_name option. */
@@ -2159,18 +2203,13 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
 
 		appendStringInfoString(buf, quote_identifier(colname));
 
-		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+		retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
 	}
 
-	/* Don't generate bad syntax for zero-column relation. */
-	if (first)
-		appendStringInfoString(buf, "NULL");
+	if (enclose_in_parens && list_length(retrieved_attrs) > 0)
+		appendStringInfoChar(buf, ')');
 
-	/*
-	 * Construct FROM clause
-	 */
-	appendStringInfoString(buf, " FROM ");
-	deparseRelation(buf, rel);
+	return retrieved_attrs;
 }
 
 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7f69fa0054..b214395a78 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8078,8 +8078,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8090,6 +8091,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8198,6 +8212,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c590f374c6..c615cafd8f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -209,6 +210,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
+	CopyToState cstate; /* foreign COPY state, if used */
 } PgFdwModifyState;
 
 /*
@@ -383,6 +385,13 @@ static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 									   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 									 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(EState *estate,
+									   ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(EState *estate,
+									 ResultRelInfo *resultRelInfo);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+									  TupleTableSlot **slots,
+									  int nslots);
 static int	postgresIsForeignRelUpdatable(Relation rel);
 static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
@@ -579,6 +588,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->EndForeignModify = postgresEndForeignModify;
 	routine->BeginForeignInsert = postgresBeginForeignInsert;
 	routine->EndForeignInsert = postgresEndForeignInsert;
+	routine->BeginForeignCopy = postgresBeginForeignCopy;
+	routine->ExecForeignCopy = postgresExecForeignCopy;
+	routine->EndForeignCopy = postgresEndForeignCopy;
 	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
 	routine->PlanDirectModify = postgresPlanDirectModify;
 	routine->BeginDirectModify = postgresBeginDirectModify;
@@ -2209,6 +2221,135 @@ postgresEndForeignInsert(EState *estate,
 	finish_foreign_modify(fmstate);
 }
 
+static PgFdwModifyState *copy_fmstate = NULL;
+
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+	PGconn *conn = copy_fmstate->conn;
+
+	if (PQputCopyData(conn, (char *) buf, len) <= 0)
+		pgfdw_report_error(ERROR, NULL, conn, false, copy_fmstate->query);
+}
+
+/*
+ * postgresBeginForeignCopy
+ *		Begin a COPY operation on a foreign table
+ */
+static void
+postgresBeginForeignCopy(EState *estate,
+						   ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate;
+	StringInfoData sql;
+	RangeTblEntry *rte;
+	Relation rel = resultRelInfo->ri_RelationDesc;
+
+	if (resultRelInfo->ri_RangeTableIndex == 0)
+	{
+		ResultRelInfo *rootResultRelInfo = resultRelInfo->ri_RootResultRelInfo;
+
+		Assert(rootResultRelInfo != NULL);
+		rte = exec_rt_fetch(rootResultRelInfo->ri_RangeTableIndex, estate);
+		rte = copyObject(rte);
+		rte->relid = RelationGetRelid(rel);
+		rte->relkind = RELKIND_FOREIGN_TABLE;
+	}
+	else
+		rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex, estate);
+
+	initStringInfo(&sql);
+	deparseCopyFromSql(&sql, rel);
+
+	fmstate = create_foreign_modify(estate,
+									rte,
+									resultRelInfo,
+									CMD_INSERT,
+									NULL,
+									sql.data,
+									NIL,
+									-1,
+									false,
+									NIL);
+
+	fmstate->cstate = BeginCopyTo(NULL, rel, NULL,
+								  InvalidOid, NULL, false, pgfdw_copy_dest_cb,
+								  NIL, NIL);
+	CopyToStart(fmstate->cstate);
+	resultRelInfo->ri_FdwState = fmstate;
+}
+
+/*
+ * postgresExecForeignCopy
+ *		Send a number of tuples to the foreign relation.
+ */
+static void
+postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots, int nslots)
+{
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState;
+	PGresult *res;
+	PGconn *conn = fmstate->conn;
+	bool OK = false;
+	int i;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+
+	res = PQexec(conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COPY_IN)
+		pgfdw_report_error(ERROR, res, conn, true, fmstate->query);
+	PQclear(res);
+
+	PG_TRY();
+	{
+		copy_fmstate = fmstate;
+		for (i = 0; i < nslots; i++)
+			CopyOneRowTo(fmstate->cstate, slots[i]);
+
+		OK = true;
+	}
+	PG_FINALLY();
+	{
+		/*
+		 * Finish COPY IN protocol. It is needed to do after successful copy or
+		 * after an error.
+		 */
+		if (PQputCopyEnd(conn, OK ? NULL : "canceled by server") <= 0)
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/* After successfully  sending an EOF signal, check command OK. */
+		res = PQgetResult(conn);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		PQclear(res);
+		/* Do this to ensure we have not gotten extra results */
+		if (PQgetResult(conn) != NULL)
+			ereport(ERROR,
+					(errmsg("unexpected extra results during COPY of table: %s",
+							PQerrorMessage(conn))));
+	}
+	PG_END_TRY();
+}
+
+/*
+ * postgresEndForeignCopy
+ *		Finish a COPY operation on a foreign table
+ */
+static void
+postgresEndForeignCopy(EState *estate, ResultRelInfo *resultRelInfo)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+
+	/* Check correct use of CopyIn FDW API. */
+	Assert(fmstate->cstate != NULL);
+	CopyToFinish(fmstate->cstate);
+	pfree(fmstate->cstate);
+	fmstate->cstate = NULL;
+	finish_foreign_modify(fmstate);
+}
+
 /*
  * postgresIsForeignRelUpdatable
  *		Determine whether a foreign table supports INSERT, UPDATE and/or
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 5d44b75314..10392f6ec2 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -179,6 +179,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 extern void rebuildInsertSql(StringInfo buf, char *orig_query,
 							 int values_end_len, int num_cols,
 							 int num_rows);
+extern void deparseCopyFromSql(StringInfo buf, Relation rel);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7487096eac..32062b4a55 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2237,6 +2237,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2337,6 +2354,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 98882ddab8..fad2ff6161 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -822,8 +822,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
 
      Begin executing an insert operation on a foreign table.  This routine is
      called right before the first tuple is inserted into the foreign table
-     in both cases when it is the partition chosen for tuple routing and the
-     target specified in a <command>COPY FROM</command> command.  It should
+     target specified in a <command>COPY FROM</command> command, or when
+     the foreign table is the partition chosen for tuple routing of a
+     partitioned table.  It should
      perform any initialization needed prior to the actual insertion.
      Subsequently, <function>ExecForeignInsert</function> or
      <function>ExecForeignBatchInsert</function> will be called for
@@ -1137,6 +1138,72 @@ ExecForeignTruncate(List *rels, List *rels_extra,
 
     <para>
 <programlisting>
+void
+BeginForeignCopy(EState *estate,
+                   ResultRelInfo *rinfo);
+</programlisting>
+
+     Begin executing a copy operation on a foreign table. This routine is
+     called right before the first call of <function>ExecForeignCopy</function>
+     routine for the foreign table. It should perform any initialization needed
+     prior to the actual COPY FROM operation.
+     Subsequently, <function>ExecForeignCopy</function> will be called for
+     a batch of tuples to be copied into the foreign table.
+    </para>
+
+    <para>
+     <literal>estate</literal> is global execution state for the query.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.  (The <structfield>ri_FdwState</structfield> field of
+     <structname>ResultRelInfo</structname> is available for the FDW to store any
+     private state it needs for this operation.)
+    </para>
+
+    <para>
+     If the <function>BeginForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the initialization.
+    </para>
+
+    <para>
+<programlisting>
+void
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a batch of tuples into the foreign table.
+      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     <literal>slots</literal> contains the tuples to be inserted; it will match the
+     row-type definition of the foreign table.
+     <literal>nslots</literal> is the number of tuples in the <literal>slots</literal>
+    </para>
+
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, the <function>ExecForeignInsert</function> routine will be used to run COPY on the foreign table.
+    </para>
+
+    <para>
+<programlisting>
+void
+EndForeignCopy(EState *estate,
+                 ResultRelInfo *rinfo);
+</programlisting>
+
+     End the copy operation and release resources.  It is normally not important
+     to release palloc'd memory, but for example open files and connections
+     to remote servers should be cleaned up.
+    </para>
+
+    <para>
+     If the <function>EndForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, no action is taken for the termination.
+    </para>
+
+    <para>
+<programlisting>
 RowMarkType
 GetForeignRowMarkType(RangeTblEntry *rte,
                       LockClauseStrength strength);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8265b981eb..f646770767 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -304,7 +304,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		CopyToState cstate;
 
 		cstate = BeginCopyTo(pstate, rel, query, relid,
-							 stmt->filename, stmt->is_program,
+							 stmt->filename, stmt->is_program, NULL,
 							 stmt->attlist, stmt->options);
 		*processed = DoCopyTo(cstate);	/* copy from database to file */
 		EndCopyTo(cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 20e7d57d41..b486ffd641 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -317,54 +317,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/* Flush into foreign table or partition */
+		resultRelInfo->ri_FdwRoutine->ExecForeignCopy(resultRelInfo,
+														slots,
+														nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -538,13 +548,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -654,6 +662,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -671,10 +706,22 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	/*
+	 * Init copying process into foreign table. Initialization of copying into
+	 * foreign partitions will be done later.
+	 */
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(estate,
+																	  resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+																	resultRelInfo);
+	}
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -703,83 +750,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -787,7 +760,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -830,7 +803,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -838,7 +811,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -905,24 +877,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -952,7 +914,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -971,9 +933,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1045,7 +1004,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1124,11 +1083,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1147,14 +1103,21 @@ CopyFrom(CopyFromState cstate)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
-	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+	if (target_resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (target_resultRelInfo->ri_usesMultiInsert)
+		{
+			if (target_resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+				target_resultRelInfo->ri_FdwRoutine->EndForeignCopy(estate,
+																	target_resultRelInfo);
+		}
+		else if (target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
+	}
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 7257a54e93..378233655d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -51,6 +51,7 @@ typedef enum CopyDest
 {
 	COPY_FILE,					/* to file (or a piped program) */
 	COPY_FRONTEND,				/* to frontend */
+	COPY_CALLBACK				/* to callback function */
 } CopyDest;
 
 /*
@@ -86,6 +87,7 @@ typedef struct CopyToStateData
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 
+	copy_data_dest_cb data_dest_cb;	/* function for writing data */
 	CopyFormatOptions opts;
 	Node	   *whereClause;	/* WHERE condition (or NULL) */
 
@@ -115,7 +117,6 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
-static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 static void CopyAttributeOutText(CopyToState cstate, char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, char *string,
 								bool use_quote, bool single_attr);
@@ -248,6 +249,15 @@ CopySendEndOfRow(CopyToState cstate)
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
 			break;
+		case COPY_CALLBACK:
+			Assert(!cstate->opts.binary);
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
+			break;
 	}
 
 	/* Update the progress */
@@ -345,11 +355,12 @@ BeginCopyTo(ParseState *pstate,
 			Oid queryRelId,
 			const char *filename,
 			bool is_program,
+			copy_data_dest_cb data_dest_cb,
 			List *attnamelist,
 			List *options)
 {
 	CopyToState	cstate;
-	bool		pipe = (filename == NULL);
+	bool		pipe = (filename == NULL) && (data_dest_cb == NULL);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	MemoryContext oldcontext;
@@ -362,7 +373,13 @@ BeginCopyTo(ParseState *pstate,
 		0
 	};
 
-	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
+	/*
+	 * Check whether we support copying data out of the specified relation,
+	 * unless the caller also passed a non-NULL data_dest_cb, in which case,
+	 * the callback will take care of it
+	 */
+	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION &&
+		data_dest_cb == NULL)
 	{
 		if (rel->rd_rel->relkind == RELKIND_VIEW)
 			ereport(ERROR,
@@ -673,6 +690,11 @@ BeginCopyTo(ParseState *pstate,
 		if (whereToSendOutput != DestRemote)
 			cstate->copy_file = stdout;
 	}
+	else if (data_dest_cb)
+	{
+		cstate->copy_dest = COPY_CALLBACK;
+		cstate->data_dest_cb = data_dest_cb;
+	}
 	else
 	{
 		cstate->filename = pstrdup(filename);
@@ -773,20 +795,17 @@ EndCopyTo(CopyToState cstate)
 }
 
 /*
- * Copy from relation or query TO file.
+ * Start COPY TO operation.
+ * Separate from the main routine to prevent duplicate operations in
+ * manual mode, where tuples are copied to the destination one by one, by calling
+ * the CopyOneRowTo() routine.
  */
-uint64
-DoCopyTo(CopyToState cstate)
+void
+CopyToStart(CopyToState cstate)
 {
-	bool		pipe = (cstate->filename == NULL);
-	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
 	TupleDesc	tupDesc;
 	int			num_phys_attrs;
 	ListCell   *cur;
-	uint64		processed;
-
-	if (fe_copy)
-		SendCopyBegin(cstate);
 
 	if (cstate->rel)
 		tupDesc = RelationGetDescr(cstate->rel);
@@ -876,6 +895,39 @@ DoCopyTo(CopyToState cstate)
 			CopySendEndOfRow(cstate);
 		}
 	}
+}
+
+/*
+ * Finish COPY TO operation.
+ */
+void
+CopyToFinish(CopyToState cstate)
+{
+	if (cstate->opts.binary)
+	{
+		/* Generate trailer for a binary copy */
+		CopySendInt16(cstate, -1);
+		/* Need to flush out the trailer */
+		CopySendEndOfRow(cstate);
+	}
+
+	MemoryContextDelete(cstate->rowcontext);
+}
+
+/*
+ * Copy from relation or query TO file.
+ */
+uint64
+DoCopyTo(CopyToState cstate)
+{
+	bool		pipe = (cstate->filename == NULL) && (cstate->data_dest_cb == NULL);
+	bool		fe_copy = (pipe && whereToSendOutput == DestRemote);
+	uint64		processed;
+
+	if (fe_copy)
+		SendCopyBegin(cstate);
+
+	CopyToStart(cstate);
 
 	if (cstate->rel)
 	{
@@ -914,15 +966,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
-
-	MemoryContextDelete(cstate->rowcontext);
+	CopyToFinish(cstate);
 
 	if (fe_copy)
 		SendCopyEnd(cstate);
@@ -933,7 +977,7 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	bool		need_delim = false;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b2e2df8773..f9049cfae4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1254,9 +1254,53 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
 	resultRelInfo->ri_ChildToRootMapValid = false;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
+/*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		rri->ri_FdwRoutine->ExecForeignCopy == NULL)
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary COPY interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
 /*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 99780ebb96..f402e13b9b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -514,6 +514,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootResultRelInfo,
 					  estate->es_instrument);
 
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
@@ -907,9 +915,16 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	if (partRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (partRelInfo->ri_usesMultiInsert)
+		{
+			if (partRelInfo->ri_FdwRoutine->BeginForeignCopy != NULL)
+				partRelInfo->ri_FdwRoutine->BeginForeignCopy(estate, partRelInfo);
+		}
+		else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+	}
 
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
@@ -1146,10 +1161,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
 		/* Allow any FDWs to shut down */
-		if (resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-														   resultRelInfo);
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesMultiInsert)
+			{
+				if (resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL)
+					resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+																 resultRelInfo);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+				resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+															   resultRelInfo);
+		}
 
 		/*
 		 * Close it if it's not one of the result relations borrowed from the
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8c4748e33d..3d9d187765 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -55,6 +55,7 @@ typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
 
 extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 				   int stmt_location, int stmt_len,
@@ -80,10 +81,14 @@ extern DestReceiver *CreateCopyDestReceiver(void);
  */
 extern CopyToState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
 							   Oid queryRelId, const char *filename, bool is_program,
+							   copy_data_dest_cb data_dest_cb,
 							   List *attnamelist, List *options);
 extern void EndCopyTo(CopyToState cstate);
 extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
 							List *attnamelist);
+extern void CopyToStart(CopyToState cstate);
+extern void CopyToFinish(CopyToState cstate);
+extern void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
 
 #endif							/* COPY_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 858af7a717..8f61ff3d4d 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -39,16 +39,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6eae134c08..beb8e8fcd0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -203,6 +203,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 4ebbca6de9..74fe6bdf5c 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -127,6 +127,16 @@ typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node)
 
 typedef void (*EndDirectModify_function) (ForeignScanState *node);
 
+typedef void (*BeginForeignCopy_function) (EState *estate,
+										   ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+										  TupleTableSlot **slots,
+										  int nslots);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+										 ResultRelInfo *rinfo);
+
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
@@ -244,6 +254,11 @@ typedef struct FdwRoutine
 	IterateDirectModify_function IterateDirectModify;
 	EndDirectModify_function EndDirectModify;
 
+	/* Support functions for COPY into foreign tables */
+	BeginForeignCopy_function BeginForeignCopy;
+	ExecForeignCopy_function ExecForeignCopy;
+	EndForeignCopy_function EndForeignCopy;
+
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e7ae21c023..8f13c92726 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -521,7 +521,13 @@ typedef struct ResultRelInfo
 	TupleConversionMap *ri_ChildToRootMap;
 	bool		ri_ChildToRootMapValid;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.17.0

#91Zhihong Yu
zyu@yugabyte.com
In reply to: Justin Pryzby (#90)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Thu, Apr 8, 2021 at 5:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6.

--
Justin

Hi,
In src/backend/commands/copyfrom.c :

+ if (resultRelInfo->ri_RelationDesc->rd_rel->relkind ==
RELKIND_FOREIGN_TABLE)

There are a few steps of indirection. Adding assertion before the if
statement on resultRelInfo->ri_RelationDesc, etc would help catch potential
invalid pointer.

+CopyToStart(CopyToState cstate)
...
+CopyToFinish(CopyToState cstate)

Since 'copy to' is the action, it would be easier to read the method names
if they're called StartCopyTo, FinishCopyTo, respectively.
That way, the method names would be consistent with existing ones, such as:
extern uint64 DoCopyTo(CopyToState cstate);

+ * If a partition's root parent isn't allowed to use it, neither is the

In the above sentence, 'it' refers to multi insert. It would be more
readable to explicitly mention 'multi insert' instead of 'it'

Cheers

#92Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Justin Pryzby (#90)
Re: [POC] Fast COPY FROM command for the table with foreign partitions
Show quoted text

On Fri, Apr 9, 2021 at 9:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6.

--
Justin

#93Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Justin Pryzby (#90)
Re: [POC] Fast COPY FROM command for the table with foreign partitions

On Fri, Apr 9, 2021 at 9:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6.

Thanks for rebasing!

Actually, I've started reviewing this, but I couldn't finish my
review. My apologies for not having much time on this. I'll continue
to work on it for PG15.

Sorry for the empty email.

Best regards,
Etsuro Fujita

#94Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#93)
1 attachment(s)
Fast COPY FROM based on batch insert

Hi,
We still have slow 'COPY FROM' operation for foreign tables in current
master.
Now we have a foreign batch insert operation And I tried to rewrite the
patch [1]/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru with this machinery.

The patch (see in attachment) smaller than [1]/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru and no changes required
in FDW API.

Benchmarking
============
I used two data sets: with a number of 1E6 and 1E7 tuples. As a foreign
server emulation I used loopback FDW links.

Test table:
CREATE TABLE test(a int, payload varchar(80));

Execution time of COPY FROM into single foreign table:
version | 1E6 tuples | 1E7 tuples |
master: | 64s | 775s |
Patch [1]/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru: | 5s | 50s |
Current: | 4s | 42s |
Execution time of the COPY operation into a plane table is 0.8s for 1E6
tuples and 8s for 1E7 tuples.

Execution time of COPY FROM into the table partitioned by three foreign
partitions:
version | 1E6 tuples | 1E7 tuples |
master: | 85s | 900s |
Patch [1]/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru: | 10s | 100s |
Current: | 3.5s | 34s |

But the bulk insert execution time in current implementation strongly
depends on MAX_BUFFERED_TUPLES/BYTES value and in my experiments was
reduced to 50s.

[1]: /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru
/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

0001-Implementation-of-a-Bulk-COPY-FROM.patchtext/plain; charset=UTF-8; name=0001-Implementation-of-a-Bulk-COPY-FROM.patch; x-mac-creator=0; x-mac-type=0Download
From 715406ce4a98df4e0aecdfdf9d9f59cd3a13101e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk 'COPY FROM ...' operation into
 foreign/distributed table.

---
 .../postgres_fdw/expected/postgres_fdw.out    |  46 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 +++
 src/backend/commands/copyfrom.c               | 259 ++++++++----------
 src/backend/executor/execMain.c               |  45 +++
 src/backend/executor/execPartition.c          |   8 +
 src/include/commands/copyfrom_internal.h      |  10 -
 src/include/executor/executor.h               |   1 +
 src/include/nodes/execnodes.h                 |   8 +-
 8 files changed, 261 insertions(+), 161 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f320a7578d..cb2680c6bd 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8164,8 +8164,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8176,6 +8177,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8284,6 +8298,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: COPY public.loc2(f1, f2) FROM STDIN 
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 17dba77d7e..5576328348 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2258,6 +2258,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2358,6 +2375,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 40a54ad0bd..4f65601bac 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -317,54 +317,78 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		int sent = 0;
+
+		Assert(resultRelInfo->ri_BatchSize > 1 &&
+			   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert != NULL &&
+			   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
+
+		/* Flush into foreign table or partition */
+		do {
+			int batch_size = (resultRelInfo->ri_BatchSize < nused - sent) ?
+						resultRelInfo->ri_BatchSize : (nused - sent);
+
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 NULL,
+																 &batch_size);
+			sent += batch_size;
+		} while (sent < nused);
+	}
+	else
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
-		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
-		}
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
 
-		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
-		}
+			/*
+			 * If there are any indexes, update them for all the inserted tuples,
+			 * and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false, false,
+										  NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL, cstate->transition_capture);
+			}
 
-		ExecClearTuple(slots[i]);
+			ExecClearTuple(slots[i]);
+		}
 	}
 
 	/* Mark that all slots are free */
@@ -538,13 +562,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -654,6 +676,33 @@ CopyFrom(CopyFromState cstate)
 	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
 	ExecInitResultRelation(estate, resultRelInfo, 1);
 
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
+
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
@@ -676,6 +725,12 @@ CopyFrom(CopyFromState cstate)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	if (target_resultRelInfo->ri_usesMultiInsert &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -703,83 +758,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -787,7 +768,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -830,7 +811,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -838,7 +819,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -905,24 +885,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -952,7 +922,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -971,9 +941,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1045,7 +1012,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1124,11 +1091,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1149,12 +1113,11 @@ CopyFrom(CopyFromState cstate)
 	/* Allow the FDW to shut down */
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b3ce4bae53..922e80089c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1256,9 +1256,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
 	resultRelInfo->ri_ChildToRootMapValid = false;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
+/*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		(rri->ri_FdwRoutine->ExecForeignBatchInsert == NULL ||
+		rri->ri_FdwRoutine->GetForeignModifyBatchSize == NULL))
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary bulk insert interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
 /*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 606c920b06..d784692bf8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -514,6 +514,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 					  rootResultRelInfo,
 					  estate->es_instrument);
 
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
 	/*
 	 * Verify result relation is a valid target for an INSERT.  An UPDATE of a
 	 * partition-key becomes a DELETE+INSERT operation, so this check is still
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 4d68d9cceb..598a68a6f1 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -39,16 +39,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 3dc03c913e..46a79c5ad8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -203,6 +203,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7795a69490..58d5df9874 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -521,7 +521,13 @@ typedef struct ResultRelInfo
 	TupleConversionMap *ri_ChildToRootMap;
 	bool		ri_ChildToRootMapValid;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.31.1

#95tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Andrey Lepikhov (#94)
RE: Fast COPY FROM based on batch insert

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>

We still have slow 'COPY FROM' operation for foreign tables in current master.
Now we have a foreign batch insert operation And I tried to rewrite the patch [1]
with this machinery.

I haven't looked at the patch, but nice performance.

However, I see the following problems. What do you think about them?

1)
No wonder why the user would think like "Why are INSERTs run on the remote server? I ran COPY."

2)
Without the FDW API for COPY, other FDWs won't get a chance to optimize for bulk data loading. For example, oracle_fdw might use conventional path insert for the FDW batch insert, and the direct path insert for the FDW COPY.

3)
INSERT and COPY in Postgres differs in whether the rule is invoked:

https://www.postgresql.org/docs/devel/sql-copy.html

"COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules."

Regards
Takayuki Tsunakawa

#96Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: tsunakawa.takay@fujitsu.com (#95)
Re: Fast COPY FROM based on batch insert

On 4/6/21 13:45, tsunakawa.takay@fujitsu.com wrote:

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>

We still have slow 'COPY FROM' operation for foreign tables in current master.
Now we have a foreign batch insert operation And I tried to rewrite the patch [1]
with this machinery.

I haven't looked at the patch, but nice performance.

However, I see the following problems. What do you think about them?

I agree with your fears.
Think about this patch as an intermediate step on the way to fast COPY
FROM. This patch contains all logic of the previous patch, except of
transport machinery (bulk insertion api).
It may be simpler to understand advantages of proposed 'COPY' FDW API
having committed 'COPY FROM ...' feature based on the bulk insert FDW API.

--
regards,
Andrey Lepikhov
Postgres Professional

#97Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Andrey Lepikhov (#96)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

Second version of the patch fixes problems detected by the FDW
regression tests and shows differences of error reports in
tuple-by-tuple and batched COPY approaches.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v2-0001-Implementation-of-a-Bulk-COPY-FROM.patchtext/plain; charset=UTF-8; name=v2-0001-Implementation-of-a-Bulk-COPY-FROM.patch; x-mac-creator=0; x-mac-type=0Download
From 68ad02038d7477e005b65bf5aeeac4efbb41073e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign
 table.

---
 .../postgres_fdw/expected/postgres_fdw.out    |  45 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  47 ++++
 src/backend/commands/copyfrom.c               | 216 ++++++++----------
 src/backend/executor/execMain.c               |  45 ++++
 src/backend/executor/execPartition.c          |   8 +
 src/include/commands/copyfrom_internal.h      |  10 -
 src/include/executor/executor.h               |   1 +
 src/include/nodes/execnodes.h                 |   8 +-
 8 files changed, 246 insertions(+), 134 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f320a7578d..146a3be576 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8143,6 +8143,7 @@ drop table loct2;
 -- ===================================================================
 -- test COPY FROM
 -- ===================================================================
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -8165,7 +8166,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8176,6 +8177,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8284,6 +8298,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), ($3, $4)
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8300,6 +8342,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 17dba77d7e..8371d16c6a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2226,6 +2226,7 @@ drop table loct2;
 -- test COPY FROM
 -- ===================================================================
 
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -2258,6 +2259,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2358,6 +2376,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -2371,6 +2417,7 @@ commit;
 select * from rem3;
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 
 -- ===================================================================
 -- test for TRUNCATE
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 40a54ad0bd..c4f858e2b7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -317,18 +317,43 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		int sent = 0;
+
+		Assert(resultRelInfo->ri_BatchSize > 1 &&
+			   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert != NULL &&
+			   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
+
+		/* Flush into foreign table or partition */
+		do {
+			int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
+						resultRelInfo->ri_BatchSize : (nused - sent);
+			int inserted = size;
+
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 &slots[sent],
+																 NULL,
+																 &inserted);
+			sent += size;
+		} while (sent < nused);
+	}
+	else
+	{
+		/*
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
+		 */
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+	}
 
 	for (i = 0; i < nused; i++)
 	{
@@ -340,6 +365,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		{
 			List	   *recheckIndexes;
 
+			Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
 				ExecInsertIndexTuples(resultRelInfo,
@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
 				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
 		{
+			/*
+			 * AFTER ROW triggers aren't allowed with the foreign bulk insert
+			 * method.
+			 */
+			Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+
 			cstate->cur_lineno = buffer->linenos[i];
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], NIL, cstate->transition_capture);
@@ -538,13 +571,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -671,10 +702,43 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+
+		if (resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+															 resultRelInfo);
+	}
+
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -703,83 +767,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -787,7 +777,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -830,7 +820,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -838,7 +828,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -905,24 +894,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -952,7 +931,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -971,9 +950,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1045,7 +1021,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1124,11 +1100,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1149,12 +1122,11 @@ CopyFrom(CopyFromState cstate)
 	/* Allow the FDW to shut down */
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b3ce4bae53..92947eaecd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1256,9 +1256,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
 	resultRelInfo->ri_ChildToRootMapValid = false;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
+/*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		(rri->ri_FdwRoutine->ExecForeignBatchInsert == NULL ||
+		rri->ri_BatchSize <= 1))
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary bulk insert interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
 /*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 606c920b06..e2ea101cd9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -661,6 +661,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
 						leaf_part_rri, partidx, false);
 
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
 	/*
 	 * If there is an ON CONFLICT clause, initialize state for it.
 	 */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 4d68d9cceb..598a68a6f1 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -39,16 +39,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 3dc03c913e..46a79c5ad8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -203,6 +203,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7795a69490..58d5df9874 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -521,7 +521,13 @@ typedef struct ResultRelInfo
 	TupleConversionMap *ri_ChildToRootMap;
 	bool		ri_ChildToRootMapValid;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/*
+	 * The following fields are currently only relevant to copyfrom.c.
+	 * True if okay to use multi-insert on this relation
+	 */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;
 
-- 
2.31.1

#98Andres Freund
andres@anarazel.de
In reply to: Andrey Lepikhov (#97)
Re: Fast COPY FROM based on batch insert

On 2021-06-07 16:16:58 +0500, Andrey Lepikhov wrote:

Second version of the patch fixes problems detected by the FDW regression
tests and shows differences of error reports in tuple-by-tuple and batched
COPY approaches.

Patch doesn't apply and likely hasn't for a while...

#99Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#94)
Re: Fast COPY FROM based on batch insert

On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

We still have slow 'COPY FROM' operation for foreign tables in current
master.
Now we have a foreign batch insert operation And I tried to rewrite the
patch [1] with this machinery.

I’d been reviewing the previous version of the patch without noticing
this. (Gmail grouped it in a new thread due to the subject change,
but I overlooked the whole thread.)

I agree with you that the first step for fast copy into foreign
tables/partitions is to use the foreign-batch-insert API. (Actually,
I was also thinking the same while reviewing the previous version.)
Thanks for the new version of the patch!

The patch has been rewritten to something essentially different, but
no one reviewed it. (Tsunakawa-san gave some comments without looking
at it, though.) So the right status of the patch is “Needs review”,
rather than “Ready for Committer”? Anyway, here are a few review
comments from me:

* I don’t think this assumption is correct:

@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
                 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
                  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
        {
+           /*
+            * AFTER ROW triggers aren't allowed with the foreign bulk insert
+            * method.
+            */
+           Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
RELKIND_FOREIGN_TABLE);
+

In postgres_fdw we disable foreign batch insert when the target table
has AFTER ROW triggers, but the core allows it even in that case. No?

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed(). But I’m not sure we really need such a
change. Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

I didn’t finish my review, but I’ll mark this as “Waiting on Author”.

My apologies for the long long delay.

Best regards,
Etsuro Fujita

#100Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andres Freund (#98)
Re: Fast COPY FROM based on batch insert

On Tue, Mar 22, 2022 at 8:58 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-06-07 16:16:58 +0500, Andrey Lepikhov wrote:

Second version of the patch fixes problems detected by the FDW regression
tests and shows differences of error reports in tuple-by-tuple and batched
COPY approaches.

Patch doesn't apply and likely hasn't for a while...

Actually, it has bit-rotted due to the recent fix for cross-partition
updates (i.e., commit ba9a7e392).

Thanks!

Best regards,
Etsuro Fujita

#101Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#99)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On 3/22/22 06:54, Etsuro Fujita wrote:

On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

We still have slow 'COPY FROM' operation for foreign tables in current
master.
Now we have a foreign batch insert operation And I tried to rewrite the
patch [1] with this machinery.

The patch has been rewritten to something essentially different, but
no one reviewed it. (Tsunakawa-san gave some comments without looking
at it, though.) So the right status of the patch is “Needs review”,
rather than “Ready for Committer”? Anyway, here are a few review
comments from me:

* I don’t think this assumption is correct:

@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
resultRelInfo->ri_TrigDesc->trig_insert_new_table))
{
+           /*
+            * AFTER ROW triggers aren't allowed with the foreign bulk insert
+            * method.
+            */
+           Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
RELKIND_FOREIGN_TABLE);
+

In postgres_fdw we disable foreign batch insert when the target table
has AFTER ROW triggers, but the core allows it even in that case. No?

Agree

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed(). But I’m not sure we really need such a
change. Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.
I'll ponder how to do it.

I didn’t finish my review, but I’ll mark this as “Waiting on Author”.

I rebased the patch onto current master. Now it works correctly. I'll
mark it as "Waiting for review".

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v3-0001-Implementation-of-a-Bulk-COPY-FROM.patchtext/x-patch; charset=UTF-8; name=v3-0001-Implementation-of-a-Bulk-COPY-FROM.patchDownload
From 2d51d0f5d94a3e4b3400714b5841228d1896fb56 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign
 table.

---
 .../postgres_fdw/expected/postgres_fdw.out    |  45 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  47 ++++
 src/backend/commands/copyfrom.c               | 210 ++++++++----------
 src/backend/executor/execMain.c               |  45 ++++
 src/backend/executor/execPartition.c          |   8 +
 src/include/commands/copyfrom_internal.h      |  10 -
 src/include/executor/executor.h               |   1 +
 src/include/nodes/execnodes.h                 |   5 +-
 8 files changed, 237 insertions(+), 134 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f210f91188..a803029f2f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8415,6 +8415,7 @@ drop table loct2;
 -- ===================================================================
 -- test COPY FROM
 -- ===================================================================
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -8437,7 +8438,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8448,6 +8449,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8556,6 +8570,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), ($3, $4)
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8572,6 +8614,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 95b6b7192e..847b869629 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2322,6 +2322,7 @@ drop table loct2;
 -- test COPY FROM
 -- ===================================================================
 
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -2354,6 +2355,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2454,6 +2472,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -2467,6 +2513,7 @@ commit;
 select * from rem3;
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 
 -- ===================================================================
 -- test for TRUNCATE
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index db6eb6fae7..ba5ccf8908 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -320,18 +320,43 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		int sent = 0;
+
+		Assert(resultRelInfo->ri_BatchSize > 1 &&
+			   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert != NULL &&
+			   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
+
+		/* Flush into foreign table or partition */
+		do {
+			int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
+						resultRelInfo->ri_BatchSize : (nused - sent);
+			int inserted = size;
+
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 &slots[sent],
+																 NULL,
+																 &inserted);
+			sent += size;
+		} while (sent < nused);
+	}
+	else
+	{
+		/*
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
+		 */
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+	}
 
 	for (i = 0; i < nused; i++)
 	{
@@ -343,6 +368,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		{
 			List	   *recheckIndexes;
 
+			Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
 				ExecInsertIndexTuples(resultRelInfo,
@@ -541,13 +568,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -674,10 +699,43 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+
+		if (resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+															 resultRelInfo);
+	}
+
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -706,83 +764,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -790,7 +774,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -833,7 +817,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -841,7 +825,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -908,24 +891,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -955,7 +928,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -974,9 +947,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1048,7 +1018,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1127,11 +1097,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1152,12 +1119,11 @@ CopyFrom(CopyFromState cstate)
 	/* Allow the FDW to shut down */
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 473d2e00a2..8727c5ca89 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1257,9 +1257,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
 	resultRelInfo->ri_ChildToRootMapValid = false;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
+/*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		(rri->ri_FdwRoutine->ExecForeignBatchInsert == NULL ||
+		rri->ri_BatchSize <= 1))
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary bulk insert interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
 /*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..942bd506f8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -661,6 +661,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
 						leaf_part_rri, partidx, false);
 
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
 	/*
 	 * If there is an ON CONFLICT clause, initialize state for it.
 	 */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3df1c5a97c..1c733bb9a4 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -39,16 +39,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82925b4b63..67d338920f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -203,6 +203,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid,
 											  ResultRelInfo *rootRelInfo);
 extern List *ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44dd73fc80..5617810279 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -528,7 +528,10 @@ typedef struct ResultRelInfo
 	TupleConversionMap *ri_ChildToRootMap;
 	bool		ri_ChildToRootMapValid;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/* True if okay to use multi-insert on this relation */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 
 	/*
-- 
2.25.1

#102Ian Barwick
ian.barwick@enterprisedb.com
In reply to: Andrey V. Lepikhov (#101)
Re: Fast COPY FROM based on batch insert

2022年3月24日(木) 15:44 Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>:

On 3/22/22 06:54, Etsuro Fujita wrote:

On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

We still have slow 'COPY FROM' operation for foreign tables in current
master.
Now we have a foreign batch insert operation And I tried to rewrite the
patch [1] with this machinery.

The patch has been rewritten to something essentially different, but
no one reviewed it. (Tsunakawa-san gave some comments without looking
at it, though.) So the right status of the patch is “Needs review”,
rather than “Ready for Committer”? Anyway, here are a few review
comments from me:

* I don’t think this assumption is correct:

@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
resultRelInfo->ri_TrigDesc->trig_insert_new_table))
{
+           /*
+            * AFTER ROW triggers aren't allowed with the foreign bulk insert
+            * method.
+            */
+           Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
RELKIND_FOREIGN_TABLE);
+

In postgres_fdw we disable foreign batch insert when the target table
has AFTER ROW triggers, but the core allows it even in that case. No?

Agree

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed(). But I’m not sure we really need such a
change. Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.
I'll ponder how to do it.

I didn’t finish my review, but I’ll mark this as “Waiting on Author”.

I rebased the patch onto current master. Now it works correctly. I'll
mark it as "Waiting for review".

I took a look at this patch as it would a useful optimization to have.

It applies cleanly to current HEAD, but as-is, with a large data set, it
reproducibly fails like this (using postgres_fdw):

postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH (format csv);
ERROR: bind message supplies 0 parameters, but prepared statement "pgsql_fdw_prep_19422" requires 6
CONTEXT: remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
COPY foo, line 17281589

This occurs because not all multi-insert buffers being flushed actually contain
tuples; the fix is simply not to call ExecForeignBatchInsert() if that's the case,
e.g:

/* Flush into foreign table or partition */
do {
int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
resultRelInfo->ri_BatchSize : (nused - sent);

if (size)
{
int inserted = size;

resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
resultRelInfo,
&slots[sent],
NULL,
&inserted);
sent += size;
}
} while (sent < nused);

There might a case for arguing that the respective FDW should check that it has
actually received some tuples to insert, but IMHO it's much preferable to catch
this as early as possible and avoid a superfluous call.

FWIW, with the above fix in place, with a simple local test the patch produces a
consistent speed-up of about 8 times compared to the existing functionality.

Regards

Ian Barwick

--

EnterpriseDB - https://www.enterprisedb.com

#103Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ian Barwick (#102)
Re: Fast COPY FROM based on batch insert

On 7/7/2022 06:14, Ian Barwick wrote:

2022年3月24日(木) 15:44 Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>:

On 3/22/22 06:54, Etsuro Fujita wrote:

On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

We still have slow 'COPY FROM' operation for foreign tables in

current

master.
Now we have a foreign batch insert operation And I tried to

rewrite the

patch [1] with this machinery.

The patch has been rewritten to something essentially different, but
no one reviewed it.  (Tsunakawa-san gave some comments without looking
at it, though.)  So the right status of the patch is “Needs review”,
rather than “Ready for Committer”?  Anyway, here are a few review
comments from me:

* I don’t think this assumption is correct:

@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo

*miinfo,

(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||

                    resultRelInfo->ri_TrigDesc->trig_insert_new_table))
          {
+           /*
+            * AFTER ROW triggers aren't allowed with the foreign 

bulk insert

+            * method.
+            */
+           Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
RELKIND_FOREIGN_TABLE);
+

In postgres_fdw we disable foreign batch insert when the target table
has AFTER ROW triggers, but the core allows it even in that case.  No?

Agree

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed().  But I’m not sure we really need such a
change.  Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.
I'll ponder how to do it.

I didn’t finish my review, but I’ll mark this as “Waiting on Author”.

I rebased the patch onto current master. Now it works correctly. I'll
mark it as "Waiting for review".

I took a look at this patch as it would a useful optimization to have.

It applies cleanly to current HEAD, but as-is, with a large data set, it
reproducibly fails like this (using postgres_fdw):

    postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH
(format csv);
    ERROR:  bind message supplies 0 parameters, but prepared statement
"pgsql_fdw_prep_19422" requires 6
    CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1,
v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
    COPY foo, line 17281589

This occurs because not all multi-insert buffers being flushed actually
contain
tuples; the fix is simply not to call ExecForeignBatchInsert() if that's
the case,
e.g:

        /* Flush into foreign table or partition */
        do {
            int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
                        resultRelInfo->ri_BatchSize : (nused - sent);

            if (size)
            {
                int inserted = size;

resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,

resultRelInfo,

&slots[sent],
                                                                     NULL,

&inserted);
                sent += size;
            }
        } while (sent < nused);

There might a case for arguing that the respective FDW should check that
it has
actually received some tuples to insert, but IMHO it's much preferable
to catch
this as early as possible and avoid a superfluous call.

FWIW, with the above fix in place, with a simple local test the patch
produces a
consistent speed-up of about 8 times compared to the existing
functionality.

Thank you for the attention to the patch.
I have a couple of questions:
1. It's a problem for me to reproduce the case you reported. Can you
give more details on the reproduction?
2. Have you tried to use previous version, based on bulk COPY machinery,
not bulk INSERT? Which approach looks better and have better performance
in your opinion?

--
regards,
Andrey Lepikhov
Postgres Professional

#104Ian Barwick
ian.barwick@enterprisedb.com
In reply to: Andrey Lepikhov (#103)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On 07/07/2022 22:51, Andrey Lepikhov wrote:

On 7/7/2022 06:14, Ian Barwick wrote:

2022年3月24日(木) 15:44 Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>:

On 3/22/22 06:54, Etsuro Fujita wrote:

On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

We still have slow 'COPY FROM' operation for foreign tables in current
master.
Now we have a foreign batch insert operation And I tried to rewrite the
patch [1] with this machinery.

The patch has been rewritten to something essentially different, but
no one reviewed it. (Tsunakawa-san gave some comments without looking
at it, though.) So the right status of the patch is “Needs review”,
rather than “Ready for Committer”? Anyway, here are a few review
comments from me:

* I don’t think this assumption is correct:

@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
resultRelInfo->ri_TrigDesc->trig_insert_new_table))
{
+           /*
+            * AFTER ROW triggers aren't allowed with the foreign bulk insert
+            * method.
+            */
+           Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
RELKIND_FOREIGN_TABLE);
+

In postgres_fdw we disable foreign batch insert when the target table
has AFTER ROW triggers, but the core allows it even in that case. No?

Agree

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed(). But I’m not sure we really need such a
change. Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.
I'll ponder how to do it.

I didn’t finish my review, but I’ll mark this as “Waiting on Author”.

I rebased the patch onto current master. Now it works correctly. I'll
mark it as "Waiting for review".

I took a look at this patch as it would a useful optimization to have.

It applies cleanly to current HEAD, but as-is, with a large data set, it
reproducibly fails like this (using postgres_fdw):

postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH (format csv);
ERROR: bind message supplies 0 parameters, but prepared statement "pgsql_fdw_prep_19422" requires 6
CONTEXT: remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
COPY foo, line 17281589

This occurs because not all multi-insert buffers being flushed actually contain
tuples; the fix is simply not to call ExecForeignBatchInsert() if that's the case,
e.g:

/* Flush into foreign table or partition */
do {
int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
resultRelInfo->ri_BatchSize : (nused - sent);

if (size)
{
int inserted = size;

resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
resultRelInfo,
&slots[sent],
NULL,
&inserted);
sent += size;
}
} while (sent < nused);

There might a case for arguing that the respective FDW should check that it has
actually received some tuples to insert, but IMHO it's much preferable to catch
this as early as possible and avoid a superfluous call.

FWIW, with the above fix in place, with a simple local test the patch produces a
consistent speed-up of about 8 times compared to the existing functionality.

Thank you for the attention to the patch.
I have a couple of questions:

1. It's a problem for me to reproduce the case you reported. Can you give more
details on the reproduction?

The issue seems to occur when the data spans more than one foreign partition,
probably because the accumulated data for one partition needs to be flushed
before moving on to the next partition, but not all pre-allocated multi-insert
buffers have been filled.

The reproduction method I have, which is pared down from the original bulk insert
which triggered the error, is as follows:

1. Create some data using the attached script:

perl data.pl > /tmp/data.csv

2. Create two nodes (A and B)

3. On node B, create tables as follows:

CREATE TABLE foo_part_1 (t timestamptz, v1 int, v2 int, v3 int, v4 text, v5 text);
CREATE TABLE foo_part_2 (t timestamptz, v1 int, v2 int, v3 int, v4 text, v5 text);

4. On node A, create FDW and partitioned table as follows:

-- adjust parameters as appropriate

CREATE EXTENSION postgres_fdw;

CREATE SERVER pg_fdw
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (
host 'localhost',
port '6301',
dbname 'postgres',
batch_size '100'
);

CREATE USER MAPPING FOR CURRENT_USER SERVER pg_fdw
OPTIONS(user 'postgres');

-- create parition table and partitions

CREATE TABLE foo (t timestamptz, v1 int, v2 int, v3 int, v4 text, v5 text) PARTITION BY RANGE(t);

CREATE FOREIGN TABLE foo_part_1
PARTITION OF foo
FOR VALUES FROM ('2022-05-19 00:00:00') TO ('2022-05-20 00:00:00')
SERVER pg_fdw;

CREATE FOREIGN TABLE foo_part_2
PARTITION OF foo
FOR VALUES FROM ('2022-05-20 00:00:00') TO ('2022-05-21 00:00:00')
SERVER pg_fdw;

5. On node A, load the previously generated data with COPY:

COPY foo FROM '/tmp/data.csv' with (format 'csv');

This will fail like this:

ERROR: bind message supplies 0 parameters, but prepared statement "pgsql_fdw_prep_178" requires 6
CONTEXT: remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
COPY foo, line 88160

2. Have you tried to use previous version, based on bulk COPY machinery, not
bulk INSERT? > Which approach looks better and have better performance in
your opinion?

Aha, I didn't see that, I'll take a look.

Regards

Ian Barwick

--

EnterpriseDB - https://www.enterprisedb.com

Attachments:

data.plapplication/x-perl; name=data.plDownload
#105Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ian Barwick (#104)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On 8/7/2022 05:12, Ian Barwick wrote:

    ERROR:  bind message supplies 0 parameters, but prepared statement
"pgsql_fdw_prep_178" requires 6
    CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1,
v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
    COPY foo, line 88160

Thanks, I got it. MultiInsertBuffer are created on the first non-zero
flush of tuples into the partition and isn't deleted from the buffers
list until the end of COPY. And on a subsequent flush in the case of
empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change
(see in attachment).

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

flush_nonzero_buffer.txttext/plain; charset=UTF-8; name=flush_nonzero_buffer.txtDownload
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 245a260982..203289f7f2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -329,7 +329,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
                           resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
 
                /* Flush into foreign table or partition */
-               do {
+               while(sent < nused)
+               {
                        int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
                                                resultRelInfo->ri_BatchSize : (nused - sent);
                        int inserted = size;
@@ -340,7 +341,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
                                                                                                                                 NULL,
                                                                                                                                 &inserted);
                        sent += size;
-               } while (sent < nused);
+               }
        }
        else
        {
#106Ian Barwick
ian.barwick@enterprisedb.com
In reply to: Andrey Lepikhov (#105)
Re: Fast COPY FROM based on batch insert

On 09/07/2022 00:09, Andrey Lepikhov wrote:

On 8/7/2022 05:12, Ian Barwick wrote:

     ERROR:  bind message supplies 0 parameters, but prepared statement "pgsql_fdw_prep_178" requires 6
     CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
     COPY foo, line 88160

Thanks, I got it. MultiInsertBuffer are created on the first non-zero flush of tuples into the partition and isn't deleted from the buffers list until the end of COPY. And on a subsequent flush in the case of empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change (see in attachment).

LGTM.

Regards

Ian Barwick

--
https://www.enterprisedb.com/

#107Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ian Barwick (#106)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On 11/7/2022 04:12, Ian Barwick wrote:

On 09/07/2022 00:09, Andrey Lepikhov wrote:

On 8/7/2022 05:12, Ian Barwick wrote:

     ERROR:  bind message supplies 0 parameters, but prepared
statement "pgsql_fdw_prep_178" requires 6
     CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t,
v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
     COPY foo, line 88160

Thanks, I got it. MultiInsertBuffer are created on the first non-zero
flush of tuples into the partition and isn't deleted from the buffers
list until the end of COPY. And on a subsequent flush in the case of
empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change
(see in attachment).

LGTM.

New version (with aforementioned changes) is attached.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v4-0001-Implementation-of-a-Bulk-COPY-FROM.patchtext/plain; charset=UTF-8; name=v4-0001-Implementation-of-a-Bulk-COPY-FROM.patchDownload
From 976560f2ad406adba1aaf58a188b44302855ee12 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign
 table.

---
 .../postgres_fdw/expected/postgres_fdw.out    |  45 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  47 ++++
 src/backend/commands/copyfrom.c               | 211 ++++++++----------
 src/backend/executor/execMain.c               |  45 ++++
 src/backend/executor/execPartition.c          |   8 +
 src/include/commands/copyfrom_internal.h      |  10 -
 src/include/executor/executor.h               |   1 +
 src/include/nodes/execnodes.h                 |   5 +-
 8 files changed, 238 insertions(+), 134 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 44457f930c..aced9a6428 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8435,6 +8435,7 @@ drop table loct2;
 -- ===================================================================
 -- test COPY FROM
 -- ===================================================================
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -8457,7 +8458,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8468,6 +8469,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8576,6 +8590,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), ($3, $4)
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8592,6 +8634,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 92d1212027..5c047ce8ee 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2330,6 +2330,7 @@ drop table loct2;
 -- test COPY FROM
 -- ===================================================================
 
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -2362,6 +2363,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2462,6 +2480,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -2475,6 +2521,7 @@ commit;
 select * from rem3;
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 
 -- ===================================================================
 -- test for TRUNCATE
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a976008b3d..08d321f176 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -320,18 +320,44 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		int sent = 0;
+
+		Assert(resultRelInfo->ri_BatchSize > 1 &&
+			   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert != NULL &&
+			   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
+
+		/* Flush into foreign table or partition */
+		while (sent < nused)
+		{
+			int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
+						resultRelInfo->ri_BatchSize : (nused - sent);
+			int inserted = size;
+
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 &slots[sent],
+																 NULL,
+																 &inserted);
+			sent += size;
+		}
+	}
+	else
+	{
+		/*
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
+		 */
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+	}
 
 	for (i = 0; i < nused; i++)
 	{
@@ -343,6 +369,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		{
 			List	   *recheckIndexes;
 
+			Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
 				ExecInsertIndexTuples(resultRelInfo,
@@ -541,13 +569,11 @@ CopyFrom(CopyFromState cstate)
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
 	BulkInsertState bistate = NULL;
-	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
-	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -674,10 +700,43 @@ CopyFrom(CopyFromState cstate)
 	mtstate->resultRelInfo = resultRelInfo;
 	mtstate->rootResultRelInfo = resultRelInfo;
 
-	if (resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-														 resultRelInfo);
+	if (resultRelInfo->ri_FdwRoutine != NULL)
+	{
+		if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+
+		if (resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+			resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+															 resultRelInfo);
+	}
+
+	Assert(!target_resultRelInfo->ri_usesMultiInsert);
+
+	/*
+	 * It's generally more efficient to prepare a bunch of tuples for
+	 * insertion, and insert them in bulk, for example, with one
+	 * table_multi_insert() call than call table_tuple_insert() separately for
+	 * every tuple. However, there are a number of reasons why we might not be
+	 * able to do this.  For example, if there any volatile expressions in the
+	 * table's default values or in the statement's WHERE clause, which may
+	 * query the table we are inserting into, buffering tuples might produce
+	 * wrong results.  Also, the relation we are trying to insert into itself
+	 * may not be amenable to buffered inserts.
+	 *
+	 * Note: For partitions, this flag is set considering the target table's
+	 * flag that is being set here and partition's own properties which are
+	 * checked by calling ExecMultiInsertAllowed().  It does not matter
+	 * whether partitions have any volatile default expressions as we use the
+	 * defaults from the target of the COPY command.
+	 * Also, the COPY command requires a non-zero input list of attributes.
+	 * Therefore, the length of the attribute list is checked here.
+	 */
+	if (!cstate->volatile_defexprs &&
+		list_length(cstate->attnumlist) > 0 &&
+		!contain_volatile_functions(cstate->whereClause))
+		target_resultRelInfo->ri_usesMultiInsert =
+					ExecMultiInsertAllowed(target_resultRelInfo);
 
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
@@ -706,83 +765,9 @@ CopyFrom(CopyFromState cstate)
 		cstate->qualexpr = ExecInitQual(castNode(List, cstate->whereClause),
 										&mtstate->ps);
 
-	/*
-	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
-	 */
-	if (resultRelInfo->ri_TrigDesc != NULL &&
-		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-		 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
-	{
-		/*
-		 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
-		 * triggers on the table. Such triggers might query the table we're
-		 * inserting into and act differently if the tuples that have already
-		 * been processed and prepared for insertion are not there.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
-			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
-	{
-		/*
-		 * For partitioned tables we can't support multi-inserts when there
-		 * are any statement level insert triggers. It might be possible to
-		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
-		 * and statement level insert triggers are on the same relation.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
-	{
-		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
-		 *
-		 * Note: It does not matter if any partitions have any volatile
-		 * default expressions as we use the defaults from the target of the
-		 * COPY command.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else if (contain_volatile_functions(cstate->whereClause))
-	{
-		/*
-		 * Can't support multi-inserts if there are any volatile function
-		 * expressions in WHERE clause.  Similarly to the trigger case above,
-		 * such expressions may query the table we're inserting into.
-		 */
-		insertMethod = CIM_SINGLE;
-	}
-	else
-	{
-		/*
-		 * For partitioned tables, we may still be able to perform bulk
-		 * inserts.  However, the possibility of this depends on which types
-		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
-		 */
-		if (proute)
-			insertMethod = CIM_MULTI_CONDITIONAL;
-		else
-			insertMethod = CIM_MULTI;
-
+	if (resultRelInfo->ri_usesMultiInsert)
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
 								estate, mycid, ti_options);
-	}
 
 	/*
 	 * If not using batch mode (which allocates slots as needed) set up a
@@ -790,7 +775,7 @@ CopyFrom(CopyFromState cstate)
 	 * one, even if we might batch insert, to read the tuple in the root
 	 * partition's form.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	if (!resultRelInfo->ri_usesMultiInsert || proute)
 	{
 		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
 									   &estate->es_tupleTable);
@@ -833,7 +818,7 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
+		if (!target_resultRelInfo->ri_usesMultiInsert || proute)
 		{
 			myslot = singleslot;
 			Assert(myslot != NULL);
@@ -841,7 +826,6 @@ CopyFrom(CopyFromState cstate)
 		else
 		{
 			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
 
 			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 													 resultRelInfo);
@@ -908,24 +892,14 @@ CopyFrom(CopyFromState cstate)
 				has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 											   resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
 
-				/*
-				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
-				 */
-				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
-					!has_before_insert_row_trig &&
-					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
-
 				/* Set the multi-insert buffer to use for this partition. */
-				if (leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
 													   resultRelInfo);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
@@ -955,7 +929,7 @@ CopyFrom(CopyFromState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_RootToPartitionMap;
-			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
+			if (!resultRelInfo->ri_usesMultiInsert)
 			{
 				/* non batch insert */
 				if (map != NULL)
@@ -974,9 +948,6 @@ CopyFrom(CopyFromState cstate)
 				 */
 				TupleTableSlot *batchslot;
 
-				/* no other path available for partitioned table */
-				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
-
 				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
 															resultRelInfo);
 
@@ -1048,7 +1019,7 @@ CopyFrom(CopyFromState cstate)
 					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/* Store the slot in the multi-insert buffer, when enabled. */
-				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
+				if (resultRelInfo->ri_usesMultiInsert)
 				{
 					/*
 					 * The slot previously might point into the per-tuple
@@ -1127,11 +1098,8 @@ CopyFrom(CopyFromState cstate)
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+	if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+		CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
@@ -1152,12 +1120,11 @@ CopyFrom(CopyFromState cstate)
 	/* Allow the FDW to shut down */
 	if (target_resultRelInfo->ri_FdwRoutine != NULL &&
 		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-		target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
-															  target_resultRelInfo);
+			target_resultRelInfo->ri_FdwRoutine->EndForeignInsert(estate,
+														target_resultRelInfo);
 
 	/* Tear down the multi-insert buffer data */
-	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+	CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..da95f2efb7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1260,9 +1260,54 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
 	resultRelInfo->ri_ChildToRootMapValid = false;
+	resultRelInfo->ri_usesMultiInsert = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
+/*
+ * ExecMultiInsertAllowed
+ *		Does this relation allow caller to use multi-insert mode when
+ *		inserting rows into it?
+ */
+bool
+ExecMultiInsertAllowed(const ResultRelInfo *rri)
+{
+	/*
+	 * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
+	 * triggers on the table. Such triggers might query the table we're
+	 * inserting into and act differently if the tuples that have already
+	 * been processed and prepared for insertion are not there.
+	 */
+	if (rri->ri_TrigDesc != NULL &&
+		(rri->ri_TrigDesc->trig_insert_before_row ||
+		 rri->ri_TrigDesc->trig_insert_instead_row))
+		return false;
+
+	/*
+	 * For partitioned tables we can't support multi-inserts when there are
+	 * any statement level insert triggers. It might be possible to allow
+	 * partitioned tables with such triggers in the future, but for now,
+	 * CopyMultiInsertInfoFlush expects that any before row insert and
+	 * statement level insert triggers are on the same relation.
+	 */
+	if (rri->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		rri->ri_TrigDesc != NULL &&
+		rri->ri_TrigDesc->trig_insert_new_table)
+		return false;
+
+	if (rri->ri_FdwRoutine != NULL &&
+		(rri->ri_FdwRoutine->ExecForeignBatchInsert == NULL ||
+		rri->ri_BatchSize <= 1))
+		/*
+		 * Foreign tables don't support multi-inserts, unless their FDW
+		 * provides the necessary bulk insert interface.
+		 */
+		return false;
+
+	/* OK, caller can use multi-insert on this relation. */
+	return true;
+}
+
 /*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..990d1fd306 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -669,6 +669,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
 						leaf_part_rri, partidx, false);
 
+	/*
+	 * If a partition's root parent isn't allowed to use it, neither is the
+	 * partition.
+	 */
+	if (rootResultRelInfo->ri_usesMultiInsert)
+		leaf_part_rri->ri_usesMultiInsert =
+			ExecMultiInsertAllowed(leaf_part_rri);
+
 	/*
 	 * If there is an ON CONFLICT clause, initialize state for it.
 	 */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3df1c5a97c..1c733bb9a4 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -39,16 +39,6 @@ typedef enum EolType
 	EOL_CRNL
 } EolType;
 
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
 /*
  * This struct contains all the state variables used throughout a COPY FROM
  * operation.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..b646f91883 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -203,6 +203,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  ResultRelInfo *partition_root_rri,
 							  int instrument_options);
+extern bool ExecMultiInsertAllowed(const ResultRelInfo *rri);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid,
 											  ResultRelInfo *rootRelInfo);
 extern List *ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5728801379..80d97096b8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -548,7 +548,10 @@ typedef struct ResultRelInfo
 	TupleConversionMap *ri_ChildToRootMap;
 	bool		ri_ChildToRootMapValid;
 
-	/* for use by copyfrom.c when performing multi-inserts */
+	/* True if okay to use multi-insert on this relation */
+	bool ri_usesMultiInsert;
+
+	/* Buffer allocated to this relation when using multi-insert mode */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 
 	/*
-- 
2.37.0

#108Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey V. Lepikhov (#101)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 3/22/22 06:54, Etsuro Fujita wrote:

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed(). But I’m not sure we really need such a
change. Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.

I'll ponder how to do it.

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly. Attached is an
updated patch. What do you think about that?

While working on the patch, I fixed a few issues as well:

+       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+           resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

When determining the batch size, I think we should check if the
ExecForeignBatchInsert callback routine is also defined, like other
places such as execPartition.c. For consistency I fixed this by
copying-and-pasting the code from that file.

+    * Also, the COPY command requires a non-zero input list of attributes.
+    * Therefore, the length of the attribute list is checked here.
+    */
+   if (!cstate->volatile_defexprs &&
+       list_length(cstate->attnumlist) > 0 &&
+       !contain_volatile_functions(cstate->whereClause))
+       target_resultRelInfo->ri_usesMultiInsert =
+                   ExecMultiInsertAllowed(target_resultRelInfo);

I think “list_length(cstate->attnumlist) > 0” in the if-test would
break COPY FROM; it currently supports multi-inserting into *plain*
tables even in the case where they have no columns, but this would
disable the multi-insertion support in that case. postgres_fdw would
not be able to batch into zero-column foreign tables due to the INSERT
syntax limitation (i.e., the syntax does not allow inserting multiple
empty rows into a zero-column table in a single INSERT statement).
Which is the reason why this was added to the if-test? But I think
some other FDWs might be able to, so I think we should let the FDW
decide whether to allow batching even in that case, when called from
GetForeignModifyBatchSize. So I removed the attnumlist test from the
patch, and modified postgresGetForeignModifyBatchSize as such. I
might miss something, though.

Best regards,
Etsuro Fujita

Attachments:

v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-1.patchapplication/octet-stream; name=v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-1.patchDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ebf9ea3598..3c50f739d0 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8481,6 +8481,7 @@ drop table loct2;
 -- ===================================================================
 -- test COPY FROM
 -- ===================================================================
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -8503,7 +8504,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8514,6 +8515,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8622,6 +8636,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), ($3, $4)
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8638,6 +8680,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f3b93954ee..3bee9d19a3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2057,6 +2057,15 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		  resultRelInfo->ri_TrigDesc->trig_insert_after_row)))
 		return 1;
 
+	/*
+	 * If the foreign table has no columns, disable batching as the INSERT
+	 * syntax doesn't allow batching multiple empty rows into a zero-column
+	 * table.  This isn't needed in case of INSERT, but is in case of COPY.
+	 * Note that in the latter case fmstate must be non-NULL.
+	 */
+	if (fmstate && list_length(fmstate->target_attrs) == 0)
+		return 1;
+
 	/*
 	 * Otherwise use the batch size specified for server/table. The number of
 	 * parameters in a batch is limited to 65535 (uint16), so make sure we
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index b7817c5a41..e013d55313 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2350,6 +2350,7 @@ drop table loct2;
 -- test COPY FROM
 -- ===================================================================
 
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -2382,6 +2383,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2482,6 +2500,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -2495,6 +2541,7 @@ commit;
 select * from rem3;
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 
 -- ===================================================================
 -- test for TRUNCATE
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a976008b3d..b1cb589d60 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -320,18 +320,44 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+	if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		int sent = 0;
+
+		Assert(resultRelInfo->ri_BatchSize > 1 &&
+			   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert != NULL &&
+			   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
+
+		/* Flush into foreign table or partition */
+		while (sent < nused)
+		{
+			int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
+						resultRelInfo->ri_BatchSize : (nused - sent);
+			int inserted = size;
+
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 &slots[sent],
+																 NULL,
+																 &inserted);
+			sent += size;
+		}
+	}
+	else
+	{
+		/*
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
+		 */
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+	}
 
 	for (i = 0; i < nused; i++)
 	{
@@ -343,6 +369,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		{
 			List	   *recheckIndexes;
 
+			Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
 				ExecInsertIndexTuples(resultRelInfo,
@@ -679,6 +707,23 @@ CopyFrom(CopyFromState cstate)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	/*
+	 * Also, if the named relation is a foreign table, determine if the FDW
+	 * supports batch insert and determine the batch size (a FDW may support
+	 * batching, but it may be disabled for the server/table).
+	 *
+	 * If the FDW does not support batching, we set the batch size to 1.
+	 */
+	if (resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -725,6 +770,15 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_BatchSize == 1)
+	{
+		/*
+		 * Can't support multi-inserts to foreign tables if the FDW does not
+		 * support batching.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
 			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
 	{
@@ -737,14 +791,12 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support multi-inserts if there are any volatile default
+		 * expressions in the table.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -910,12 +962,14 @@ CopyFrom(CopyFromState cstate)
 
 				/*
 				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
+				 * OF triggers, or if the partition is a foreign partition
+				 * that can't use batching.
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
 					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					(resultRelInfo->ri_FdwRoutine == NULL ||
+					 resultRelInfo->ri_BatchSize > 1);
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
#109Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#108)
Re: Fast COPY FROM based on batch insert

On 18/7/2022 13:22, Etsuro Fujita wrote:

On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 3/22/22 06:54, Etsuro Fujita wrote:

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed(). But I’m not sure we really need such a
change. Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.

I'll ponder how to do it.

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly. Attached is an
updated patch. What do you think about that?

While working on the patch, I fixed a few issues as well:

+       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+           resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

When determining the batch size, I think we should check if the
ExecForeignBatchInsert callback routine is also defined, like other
places such as execPartition.c. For consistency I fixed this by
copying-and-pasting the code from that file.

+    * Also, the COPY command requires a non-zero input list of attributes.
+    * Therefore, the length of the attribute list is checked here.
+    */
+   if (!cstate->volatile_defexprs &&
+       list_length(cstate->attnumlist) > 0 &&
+       !contain_volatile_functions(cstate->whereClause))
+       target_resultRelInfo->ri_usesMultiInsert =
+                   ExecMultiInsertAllowed(target_resultRelInfo);

I think “list_length(cstate->attnumlist) > 0” in the if-test would
break COPY FROM; it currently supports multi-inserting into *plain*
tables even in the case where they have no columns, but this would
disable the multi-insertion support in that case. postgres_fdw would
not be able to batch into zero-column foreign tables due to the INSERT
syntax limitation (i.e., the syntax does not allow inserting multiple
empty rows into a zero-column table in a single INSERT statement).
Which is the reason why this was added to the if-test? But I think
some other FDWs might be able to, so I think we should let the FDW
decide whether to allow batching even in that case, when called from
GetForeignModifyBatchSize. So I removed the attnumlist test from the
patch, and modified postgresGetForeignModifyBatchSize as such. I
might miss something, though.

Thanks a lot,
maybe you forgot this code:
/*
* If a partition's root parent isn't allowed to use it, neither is the
* partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert =
ExecMultiInsertAllowed(leaf_part_rri);

Also, maybe to describe in documentation, if the value of batch_size is
more than 1, the ExecForeignBatchInsert routine have a chance to be called?

--
regards,
Andrey Lepikhov
Postgres Professional

#110Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#109)
Re: Fast COPY FROM based on batch insert

On Tue, Jul 19, 2022 at 6:35 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 18/7/2022 13:22, Etsuro Fujita wrote:

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly. Attached is an
updated patch. What do you think about that?

maybe you forgot this code:
/*
* If a partition's root parent isn't allowed to use it, neither is the
* partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert =
ExecMultiInsertAllowed(leaf_part_rri);

I think the patch accounts for that. Consider this bit to determine
whether to use batching for the partition chosen by
ExecFindPartition():

@@ -910,12 +962,14 @@ CopyFrom(CopyFromState cstate)

                /*
                 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-                * OF triggers, or if the partition is a foreign partition.
+                * OF triggers, or if the partition is a foreign partition
+                * that can't use batching.
                 */
                leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITION\
AL &&
                    !has_before_insert_row_trig &&
                    !has_instead_insert_row_trig &&
-                   resultRelInfo->ri_FdwRoutine == NULL;
+                   (resultRelInfo->ri_FdwRoutine == NULL ||
+                    resultRelInfo->ri_BatchSize > 1);

If the root parent isn't allowed to use batching, then we have
insertMethod=CIM_SINGLE for the parent before we get here. So in that
case we have leafpart_use_multi_insert=false for the chosen partition,
meaning that the partition isn't allowed to use batching, either.
(The patch just extends the existing decision logic to the
foreign-partition case.)

Also, maybe to describe in documentation, if the value of batch_size is
more than 1, the ExecForeignBatchInsert routine have a chance to be called?

Yeah, but I think that is the existing behavior, and that the patch
doesn't change the behavior, so I would leave that for another patch.

Thanks for reviewing!

Best regards,
Etsuro Fujita

#111Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#110)
Re: Fast COPY FROM based on batch insert

On 7/20/22 13:10, Etsuro Fujita wrote:

On Tue, Jul 19, 2022 at 6:35 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 18/7/2022 13:22, Etsuro Fujita wrote:

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly. Attached is an
updated patch. What do you think about that?

maybe you forgot this code:
/*
* If a partition's root parent isn't allowed to use it, neither is the
* partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert =
ExecMultiInsertAllowed(leaf_part_rri);

I think the patch accounts for that. Consider this bit to determine
whether to use batching for the partition chosen by
ExecFindPartition():

Agreed.

Analyzing multi-level heterogeneous partitioned configurations I
realized, that single write into a partition with a trigger will flush
buffers for all other partitions of the parent table even if the parent
haven't any triggers.
It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
{
/*
* Flush pending inserts if this partition can't use
* batching, so rows are visible to triggers etc.
*/
CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers? AFTER Trigger should see all rows of the table, but
if it isn't exists for parent, I think, we wouldn't obligate to
guarantee order of COPY into two different tables.

--
Regards
Andrey Lepikhov
Postgres Professional

#112Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#111)
Re: Fast COPY FROM based on batch insert

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Analyzing multi-level heterogeneous partitioned configurations I
realized, that single write into a partition with a trigger will flush
buffers for all other partitions of the parent table even if the parent
haven't any triggers.
It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
{
/*
* Flush pending inserts if this partition can't use
* batching, so rows are visible to triggers etc.
*/
CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?

BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.

Best regards,
Etsuro Fujita

#113Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#112)
Re: Fast COPY FROM based on batch insert

On 7/22/22 13:14, Etsuro Fujita wrote:

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?

BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.

Thanks for the explanation of your point of view. So, maybe switch
status of this patch to 'Ready for committer'?

--
Regards
Andrey Lepikhov
Postgres Professional

#114Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#113)
Re: Fast COPY FROM based on batch insert

On Fri, Jul 22, 2022 at 5:42 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

So, maybe switch
status of this patch to 'Ready for committer'?

Yeah, I think the patch is getting better, but I noticed some issues,
so I'm working on them. I think I can post a new version in the next
few days.

Best regards,
Etsuro Fujita

#115Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#112)
Re: Fast COPY FROM based on batch insert

On 7/22/22 13:14, Etsuro Fujita wrote:

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Analyzing multi-level heterogeneous partitioned configurations I
realized, that single write into a partition with a trigger will flush
buffers for all other partitions of the parent table even if the parent
haven't any triggers.
It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
{
/*
* Flush pending inserts if this partition can't use
* batching, so rows are visible to triggers etc.
*/
CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?

BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.

if you'll excuse me, I will add one more argument.
It wasn't clear, so I've made an experiment: result of a SELECT in an
INSERT trigger function shows only data, existed in the parent table
before the start of COPY.
So, we haven't tools to access newly inserting rows in neighboring
partition and don't need to flush tuple buffers immediately.
Where am I wrong?

--
Regards
Andrey Lepikhov
Postgres Professional

#116Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#115)
Re: Fast COPY FROM based on batch insert

On Wed, Jul 27, 2022 at 2:42 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 7/22/22 13:14, Etsuro Fujita wrote:

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Analyzing multi-level heterogeneous partitioned configurations I
realized, that single write into a partition with a trigger will flush
buffers for all other partitions of the parent table even if the parent
haven't any triggers.
It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
{
/*
* Flush pending inserts if this partition can't use
* batching, so rows are visible to triggers etc.
*/
CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?

BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.

if you'll excuse me, I will add one more argument.
It wasn't clear, so I've made an experiment: result of a SELECT in an
INSERT trigger function shows only data, existed in the parent table
before the start of COPY.

Is the trigger function declared VOLATILE? If so, the trigger should
see modifications to the parent table as well. See:

https://www.postgresql.org/docs/15/trigger-datachanges.html

Best regards,
Etsuro Fujita

#117Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Etsuro Fujita (#114)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On Tue, Jul 26, 2022 at 7:19 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

Yeah, I think the patch is getting better, but I noticed some issues,
so I'm working on them. I think I can post a new version in the next
few days.

* When running AFTER ROW triggers in CopyMultiInsertBufferFlush(), the
patch uses the slots passed to ExecForeignBatchInsert(), not the ones
returned by the callback function, but I don't think that that is
always correct, as the documentation about the callback function says:

The return value is an array of slots containing the data that was
actually inserted (this might differ from the data supplied, for
example as a result of trigger actions.)
The passed-in <literal>slots</literal> can be re-used for this purpose.

postgres_fdw re-uses the passed-in slots, but other FDWs might not, so
I modified the patch to reference the returned slots when running the
AFTER ROW triggers. I also modified the patch to initialize the
tts_tableOid. Attached is an updated patch, in which I made some
minor adjustments to CopyMultiInsertBufferFlush() as well.

* The patch produces incorrect error context information:

create extension postgres_fdw;
create server loopback foreign data wrapper postgres_fdw options
(dbname 'postgres');
create user mapping for current_user server loopback;
create table t1 (f1 int, f2 text);
create foreign table ft1 (f1 int, f2 text) server loopback options
(table_name 't1');
alter table t1 add constraint t1_f1positive check (f1 >= 0);
alter foreign table ft1 add constraint ft1_f1positive check (f1 >= 0);

— single insert
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

-1 foo
1 bar
\.

ERROR: new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL: Failing row contains (-1, foo).
CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES ($1, $2)
COPY ft1, line 1: "-1 foo"

— batch insert
alter server loopback options (add batch_size '2');
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

-1 foo
1 bar
\.

ERROR: new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL: Failing row contains (-1, foo).
CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
($1, $2), ($3, $4)
COPY ft1, line 3

In single-insert mode the error context information is correct, but in
batch-insert mode it isn’t (i.e., the line number isn’t correct).

The error occurs on the remote side, so I'm not sure if there is a
simple fix. What I came up with is to just suppress error context
information other than the relation name, like the attached. What do
you think about that?

(In CopyMultiInsertBufferFlush() your patch sets cstate->cur_lineno to
buffer->linenos[i] even when running AFTER ROW triggers for the i-th
row returned by ExecForeignBatchInsert(), but that wouldn’t always be
correct, as the i-th returned row might not correspond to the i-th row
originally stored in the buffer as the callback function returns only
the rows that were actually inserted on the remote side. I think the
proposed fix would address this issue as well.)

* The patch produces incorrect row count in cases where some/all of
the rows passed to ExecForeignBatchInsert() weren’t inserted on the
remote side:

create function trig_null() returns trigger as $$ begin return NULL;
end $$ language plpgsql;
create trigger trig_null before insert on t1 for each row execute
function trig_null();

— single insert
alter server loopback options (drop batch_size);
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0 foo
1 bar
\.

COPY 0

— batch insert
alter server loopback options (add batch_size '2');
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0 foo
1 bar
\.

COPY 2

The row count is correct in single-insert mode, but isn’t in batch-insert mode.

The reason is that in batch-insert mode the row counter is updated
immediately after adding the row to the buffer, not after doing
ExecForeignBatchInsert(), which might ignore the row. To fix, I
modified the patch to delay updating the row counter (and the progress
of the COPY command) until after doing the callback function. For
consistency, I also modified the patch to delay it even when batching
into plain tables. IMO I think that that would be more consistent
with the single-insert mode, as in that mode we update them after
writing the tuple out to the table or sending it to the remote side.

* I modified the patch so that when batching into foreign tables we
skip useless steps in CopyMultiInsertBufferInit() and
CopyMultiInsertBufferCleanup().

That’s all I have for now. Sorry for the delay.

Best regards,
Etsuro Fujita

Attachments:

v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-2.patchapplication/octet-stream; name=v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-2.patchDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7bf35602b0..32c2eacf7a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8533,6 +8533,7 @@ drop table loct2;
 -- ===================================================================
 -- test COPY FROM
 -- ===================================================================
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -8555,7 +8556,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+COPY rem2
 select * from rem2;
  f1 | f2  
 ----+-----
@@ -8566,6 +8567,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8674,6 +8688,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), ($3, $4)
+COPY rem2
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8690,6 +8732,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 16320170ce..7d18c70f1a 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2059,6 +2059,15 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		  resultRelInfo->ri_TrigDesc->trig_insert_after_row)))
 		return 1;
 
+	/*
+	 * If the foreign table has no columns, disable batching as the INSERT
+	 * syntax doesn't allow batching multiple empty rows into a zero-column
+	 * table.  This isn't needed in case of INSERT, but is in case of COPY.
+	 * Note that in the latter case fmstate must be non-NULL.
+	 */
+	if (fmstate && list_length(fmstate->target_attrs) == 0)
+		return 1;
+
 	/*
 	 * Otherwise use the batch size specified for server/table. The number of
 	 * parameters in a batch is limited to 65535 (uint16), so make sure we
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 42735ae78a..b0743098d4 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2368,6 +2368,7 @@ drop table loct2;
 -- test COPY FROM
 -- ===================================================================
 
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -2400,6 +2401,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.
+
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -2500,6 +2518,34 @@ drop trigger loc2_trig_row_before_insert on loc2;
 
 delete from rem2;
 
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+1	foo
+2	bar
+\.
+
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+
+
+\.
+select * from rem2;
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -2513,6 +2559,7 @@ commit;
 select * from rem3;
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 
 -- ===================================================================
 -- test for TRUNCATE
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a976008b3d..ced3b53191 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -116,6 +116,13 @@ CopyFromErrorCallback(void *arg)
 {
 	CopyFromState cstate = (CopyFromState) arg;
 
+	if (cstate->relname_only)
+	{
+		errcontext("COPY %s",
+				   cstate->cur_relname);
+		return;
+	}
+
 	if (cstate->opts.binary)
 	{
 		/* can't usefully display the data */
@@ -222,7 +229,7 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
 	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
 	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
+	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -299,83 +306,162 @@ CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
  */
 static inline void
 CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
+						   CopyMultiInsertBuffer *buffer,
+						   int64 *processed)
 {
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
 	CopyFromState cstate = miinfo->cstate;
 	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
 	int			nused = buffer->nused;
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
+	int			i;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fails.
-	 */
-	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
+	if (resultRelInfo->ri_FdwRoutine)
+	{
+		int			batch_size = resultRelInfo->ri_BatchSize;
+		int			sent = 0;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+		Assert(batch_size > 1);
 
-	for (i = 0; i < nused; i++)
-	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * We suppress error context information other than the relation name,
+		 * if one of the operations below fails.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		Assert(!cstate->relname_only);
+		cstate->relname_only = true;
+
+		while (sent < nused)
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			int			size = (batch_size < nused - sent) ? batch_size : (nused - sent);
+			int			inserted = size;
+			TupleTableSlot **rslots;
+
+			/* Batch insert into foreign table */
+			Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
+			rslots =
+				resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																	 resultRelInfo,
+																	 &slots[sent],
+																	 NULL,
+																	 &inserted);
+
+			/* If any rows were inserted, run AFTER ROW INSERT triggers. */
+			if (inserted > 0 &&
+				resultRelInfo->ri_TrigDesc != NULL &&
+				(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+				 resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				for (i = 0; i < inserted; i++)
+				{
+					TupleTableSlot *slot = rslots[i];
+
+					/*
+					 * AFTER ROW Triggers might reference the tableoid column,
+					 * so (re-)initialize tts_tableOid before evaluating them.
+					 */
+					slot->tts_tableOid =
+						RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+					ExecARInsertTriggers(estate, resultRelInfo,
+										 slot, NIL,
+										 cstate->transition_capture);
+				}
+			}
+
+			sent += size;
+
+			/* Update the row counter and progress of the COPY command */
+			if (inserted > 0)
+			{
+				*processed += inserted;
+				pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+											 *processed);
+			}
 		}
 
+		for (i = 0; i < nused; i++)
+			ExecClearTuple(slots[i]);
+
+		/* reset relname_only */
+		cstate->relname_only = false;
+	}
+	else
+	{
+		CommandId	mycid = miinfo->mycid;
+		int			ti_options = miinfo->ti_options;
+		bool		line_buf_valid = cstate->line_buf_valid;
+		uint64		save_cur_lineno = cstate->cur_lineno;
+		MemoryContext oldcontext;
+
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * Print error context information correctly, if one of the operations
+		 * below fails.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		cstate->line_buf_valid = false;
+
+		/*
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
+		 */
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			/*
+			 * If there are any indexes, update them for all the inserted
+			 * tuples, and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false,
+										  false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL,
+									 cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
 		}
 
-		ExecClearTuple(slots[i]);
+		/* Update the row counter and progress of the COPY command */
+		*processed += nused;
+		pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+									 *processed);
+
+		/* reset cur_lineno and line_buf_valid to what they were */
+		cstate->line_buf_valid = line_buf_valid;
+		cstate->cur_lineno = save_cur_lineno;
 	}
 
 	/* Mark that all slots are free */
 	buffer->nused = 0;
-
-	/* reset cur_lineno and line_buf_valid to what they were */
-	cstate->line_buf_valid = line_buf_valid;
-	cstate->cur_lineno = save_cur_lineno;
 }
 
 /*
@@ -387,22 +473,25 @@ static inline void
 CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 							 CopyMultiInsertBuffer *buffer)
 {
+	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	int			i;
 
 	/* Ensure buffer was flushed */
 	Assert(buffer->nused == 0);
 
 	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
-	FreeBulkInsertState(buffer->bistate);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+		FreeBulkInsertState(buffer->bistate);
 
 	/* Since we only create slots on demand, just drop the non-null ones. */
 	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
 		ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
+								 miinfo->ti_options);
 
 	pfree(buffer);
 }
@@ -418,7 +507,8 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
  * 'curr_rri'.
  */
 static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
+CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri,
+						 int64 *processed)
 {
 	ListCell   *lc;
 
@@ -426,7 +516,7 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	{
 		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		CopyMultiInsertBufferFlush(miinfo, buffer, processed);
 	}
 
 	miinfo->bufferedTuples = 0;
@@ -679,6 +769,23 @@ CopyFrom(CopyFromState cstate)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	/*
+	 * Also, if the named relation is a foreign table, determine if the FDW
+	 * supports batch insert and determine the batch size (a FDW may support
+	 * batching, but it may be disabled for the server/table).
+	 *
+	 * If the FDW does not support batching, we set the batch size to 1.
+	 */
+	if (resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -725,6 +832,15 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_BatchSize == 1)
+	{
+		/*
+		 * Can't support multi-inserts to foreign tables if the FDW does not
+		 * support batching.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
 			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
 	{
@@ -737,14 +853,12 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support multi-inserts if there are any volatile default
+		 * expressions in the table.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -910,12 +1024,14 @@ CopyFrom(CopyFromState cstate)
 
 				/*
 				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
+				 * OF triggers, or if the partition is a foreign partition
+				 * that can't use batching.
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
 					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					(resultRelInfo->ri_FdwRoutine == NULL ||
+					 resultRelInfo->ri_BatchSize > 1);
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -931,7 +1047,9 @@ CopyFrom(CopyFromState cstate)
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertInfoFlush(&multiInsertInfo,
+											 resultRelInfo,
+											 &processed);
 				}
 
 				if (bistate != NULL)
@@ -1067,7 +1185,17 @@ CopyFrom(CopyFromState cstate)
 					 * buffers out to their tables.
 					 */
 					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+						CopyMultiInsertInfoFlush(&multiInsertInfo,
+												 resultRelInfo,
+												 &processed);
+
+					/*
+					 * We delay updating the row counter and the progress of
+					 * the COPY command until after writing the tuples stored
+					 * in the buffer out to the table.  See
+					 * CopyMultiInsertBufferFlush().
+					 */
+					continue;	/* next tuple please */
 				}
 				else
 				{
@@ -1130,7 +1258,7 @@ CopyFrom(CopyFromState cstate)
 	if (insertMethod != CIM_SINGLE)
 	{
 		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
+			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
 	}
 
 	/* Done, clean up */
@@ -1349,6 +1477,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_lineno = 0;
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
+	cstate->relname_only = false;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37c6032ae..21e8b89baa 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -81,6 +81,7 @@ typedef struct CopyFromStateData
 	uint64		cur_lineno;		/* line number for error messages */
 	const char *cur_attname;	/* current att for error messages */
 	const char *cur_attval;		/* current att value for error messages */
+	bool		relname_only;	/* don't output line number, att, etc. */
 
 	/*
 	 * Working state
#118Zhihong Yu
zyu@yugabyte.com
In reply to: Etsuro Fujita (#117)
Re: Fast COPY FROM based on batch insert

On Tue, Aug 9, 2022 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com>
wrote:

On Tue, Jul 26, 2022 at 7:19 PM Etsuro Fujita <etsuro.fujita@gmail.com>
wrote:

Yeah, I think the patch is getting better, but I noticed some issues,
so I'm working on them. I think I can post a new version in the next
few days.

* When running AFTER ROW triggers in CopyMultiInsertBufferFlush(), the
patch uses the slots passed to ExecForeignBatchInsert(), not the ones
returned by the callback function, but I don't think that that is
always correct, as the documentation about the callback function says:

The return value is an array of slots containing the data that was
actually inserted (this might differ from the data supplied, for
example as a result of trigger actions.)
The passed-in <literal>slots</literal> can be re-used for this
purpose.

postgres_fdw re-uses the passed-in slots, but other FDWs might not, so
I modified the patch to reference the returned slots when running the
AFTER ROW triggers. I also modified the patch to initialize the
tts_tableOid. Attached is an updated patch, in which I made some
minor adjustments to CopyMultiInsertBufferFlush() as well.

* The patch produces incorrect error context information:

create extension postgres_fdw;
create server loopback foreign data wrapper postgres_fdw options
(dbname 'postgres');
create user mapping for current_user server loopback;
create table t1 (f1 int, f2 text);
create foreign table ft1 (f1 int, f2 text) server loopback options
(table_name 't1');
alter table t1 add constraint t1_f1positive check (f1 >= 0);
alter foreign table ft1 add constraint ft1_f1positive check (f1 >= 0);

— single insert
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

-1 foo
1 bar
\.

ERROR: new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL: Failing row contains (-1, foo).
CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES ($1, $2)
COPY ft1, line 1: "-1 foo"

— batch insert
alter server loopback options (add batch_size '2');
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

-1 foo
1 bar
\.

ERROR: new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL: Failing row contains (-1, foo).
CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
($1, $2), ($3, $4)
COPY ft1, line 3

In single-insert mode the error context information is correct, but in
batch-insert mode it isn’t (i.e., the line number isn’t correct).

The error occurs on the remote side, so I'm not sure if there is a
simple fix. What I came up with is to just suppress error context
information other than the relation name, like the attached. What do
you think about that?

(In CopyMultiInsertBufferFlush() your patch sets cstate->cur_lineno to
buffer->linenos[i] even when running AFTER ROW triggers for the i-th
row returned by ExecForeignBatchInsert(), but that wouldn’t always be
correct, as the i-th returned row might not correspond to the i-th row
originally stored in the buffer as the callback function returns only
the rows that were actually inserted on the remote side. I think the
proposed fix would address this issue as well.)

* The patch produces incorrect row count in cases where some/all of
the rows passed to ExecForeignBatchInsert() weren’t inserted on the
remote side:

create function trig_null() returns trigger as $$ begin return NULL;
end $$ language plpgsql;
create trigger trig_null before insert on t1 for each row execute
function trig_null();

— single insert
alter server loopback options (drop batch_size);
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0 foo
1 bar
\.

COPY 0

— batch insert
alter server loopback options (add batch_size '2');
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0 foo
1 bar
\.

COPY 2

The row count is correct in single-insert mode, but isn’t in batch-insert
mode.

The reason is that in batch-insert mode the row counter is updated
immediately after adding the row to the buffer, not after doing
ExecForeignBatchInsert(), which might ignore the row. To fix, I
modified the patch to delay updating the row counter (and the progress
of the COPY command) until after doing the callback function. For
consistency, I also modified the patch to delay it even when batching
into plain tables. IMO I think that that would be more consistent
with the single-insert mode, as in that mode we update them after
writing the tuple out to the table or sending it to the remote side.

* I modified the patch so that when batching into foreign tables we
skip useless steps in CopyMultiInsertBufferInit() and
CopyMultiInsertBufferCleanup().

That’s all I have for now. Sorry for the delay.

Best regards,
Etsuro Fujita

Hi,

+           /* If any rows were inserted, run AFTER ROW INSERT triggers. */
...
+               for (i = 0; i < inserted; i++)
+               {
+                   TupleTableSlot *slot = rslots[i];
...
+                   slot->tts_tableOid =
+                       RelationGetRelid(resultRelInfo->ri_RelationDesc);

It seems the return value of
`RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a
variable outside the for loop.
Inside the for loop, assign this variable to slot->tts_tableOid.

Cheers

#119Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Zhihong Yu (#118)
Re: Fast COPY FROM based on batch insert

Hi,

On Wed, Aug 10, 2022 at 1:06 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Tue, Aug 9, 2022 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

* When running AFTER ROW triggers in CopyMultiInsertBufferFlush(), the
patch uses the slots passed to ExecForeignBatchInsert(), not the ones
returned by the callback function, but I don't think that that is
always correct, as the documentation about the callback function says:

The return value is an array of slots containing the data that was
actually inserted (this might differ from the data supplied, for
example as a result of trigger actions.)
The passed-in <literal>slots</literal> can be re-used for this purpose.

postgres_fdw re-uses the passed-in slots, but other FDWs might not, so
I modified the patch to reference the returned slots when running the
AFTER ROW triggers.

I noticed that my explanation was not correct. Let me explain.
Before commit 82593b9a3, when batching into a view referencing a
postgres_fdw foreign table that has WCO constraints, postgres_fdw used
the passed-in slots to store the first tuple that was actually
inserted to the remote table. But that commit disabled batching in
that case, so postgres_fdw wouldn’t use the passed-in slots (until we
support batching when there are WCO constraints from the parent views
and/or AFTER ROW triggers on the foreign table).

+           /* If any rows were inserted, run AFTER ROW INSERT triggers. */
...
+               for (i = 0; i < inserted; i++)
+               {
+                   TupleTableSlot *slot = rslots[i];
...
+                   slot->tts_tableOid =
+                       RelationGetRelid(resultRelInfo->ri_RelationDesc);

It seems the return value of `RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a variable outside the for loop.
Inside the for loop, assign this variable to slot->tts_tableOid.

Actually, I did this to match the code in ExecBatchInsert(), but that
seems like a good idea, so I’ll update the patch as such in the next
version.

Thanks for reviewing!

Best regards,
Etsuro Fujita

#120Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#117)
Re: Fast COPY FROM based on batch insert

On 8/9/22 16:44, Etsuro Fujita wrote:

-1 foo
1 bar
\.

ERROR: new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL: Failing row contains (-1, foo).
CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
($1, $2), ($3, $4)
COPY ft1, line 3

In single-insert mode the error context information is correct, but in
batch-insert mode it isn’t (i.e., the line number isn’t correct).

The error occurs on the remote side, so I'm not sure if there is a
simple fix. What I came up with is to just suppress error context
information other than the relation name, like the attached. What do
you think about that?

I've spent many efforts to this problem too. Your solution have a
rationale and looks fine.
I only think, we should add a bit of info into an error report to
simplify comprehension why don't point specific line here. For example:
'COPY %s (buffered)'
or
'COPY FOREIGN TABLE %s'

or, if instead of relname_only field to save a MultiInsertBuffer
pointer, we might add min/max linenos into the report:
'COPY %s, line between %llu and %llu'

--
Regards
Andrey Lepikhov
Postgres Professional

#121Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#120)
Re: Fast COPY FROM based on batch insert

On Mon, Aug 15, 2022 at 2:29 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 8/9/22 16:44, Etsuro Fujita wrote:

-1 foo
1 bar
\.

ERROR: new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL: Failing row contains (-1, foo).
CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
($1, $2), ($3, $4)
COPY ft1, line 3

In single-insert mode the error context information is correct, but in
batch-insert mode it isn’t (i.e., the line number isn’t correct).

The error occurs on the remote side, so I'm not sure if there is a
simple fix. What I came up with is to just suppress error context
information other than the relation name, like the attached. What do
you think about that?

I've spent many efforts to this problem too. Your solution have a
rationale and looks fine.
I only think, we should add a bit of info into an error report to
simplify comprehension why don't point specific line here. For example:
'COPY %s (buffered)'
or
'COPY FOREIGN TABLE %s'

or, if instead of relname_only field to save a MultiInsertBuffer
pointer, we might add min/max linenos into the report:
'COPY %s, line between %llu and %llu'

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush(). Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table. My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful. Maybe I'm too
worried about that, though.

Best regards,
Etsuro Fujita

#122Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#121)
Re: Fast COPY FROM based on batch insert

On 22/8/2022 11:44, Etsuro Fujita wrote:

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush(). Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table. My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful. Maybe I'm too
worried about that, though.

I got your point. Indeed, perharps such info doesn't really needed to be
included into the core, at least for now.

--
regards,
Andrey Lepikhov
Postgres Professional

#123Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#122)
Re: Fast COPY FROM based on batch insert

On Tue, Aug 23, 2022 at 2:58 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 22/8/2022 11:44, Etsuro Fujita wrote:

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush(). Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table. My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful. Maybe I'm too
worried about that, though.

I got your point. Indeed, perharps such info doesn't really needed to be
included into the core, at least for now.

Ok. Sorry for the late response.

Best regards,
Etsuro Fujita

#124Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Etsuro Fujita (#119)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On Wed, Aug 10, 2022 at 5:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 10, 2022 at 1:06 AM Zhihong Yu <zyu@yugabyte.com> wrote:

+           /* If any rows were inserted, run AFTER ROW INSERT triggers. */
...
+               for (i = 0; i < inserted; i++)
+               {
+                   TupleTableSlot *slot = rslots[i];
...
+                   slot->tts_tableOid =
+                       RelationGetRelid(resultRelInfo->ri_RelationDesc);

It seems the return value of `RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a variable outside the for loop.
Inside the for loop, assign this variable to slot->tts_tableOid.

Actually, I did this to match the code in ExecBatchInsert(), but that
seems like a good idea, so I’ll update the patch as such in the next
version.

Done. I also adjusted the code in CopyMultiInsertBufferFlush() a bit
further. No functional changes. I put back in the original position
an assertion ensuring the FDW supports batching. Sorry for the back
and forth. Attached is an updated version of the patch.

Other changes are:

* The previous patch modified postgres_fdw.sql so that the existing
test cases for COPY FROM were tested in batch-insert mode. But I
think we should keep them as-is to test the default behavior, so I
added test cases for this feature by copying-and-pasting some of the
existing test cases. Also, the previous patch added this:

+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+   server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+   server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+   begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+   for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.

Rather than doing so, I think it would be better to use a partitioned
table defined in the above section “test tuple routing for
foreign-table partitions”, to save cycles. So I changed this as such.

* I modified comments a bit further and updated docs.

That is it. I will review the patch a bit more, but I feel that it is
in good shape.

Best regards,
Etsuro Fujita

Attachments:

v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-3.patchapplication/octet-stream; name=v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-3.patchDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2e4e82a94f..d24e050f41 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8605,6 +8605,39 @@ select tableoid::regclass, * FROM remp1;
  remp1    | 1 | bar
 (2 rows)
 
+delete from ctrtest;
+-- Test copy tuple routing with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+copy ctrtest from stdin;
+select tableoid::regclass, * FROM ctrtest;
+ tableoid | a |   b   
+----------+---+-------
+ remp1    | 1 | foo
+ remp1    | 1 | bar
+ remp1    | 1 | test1
+ remp2    | 2 | baz
+ remp2    | 2 | qux
+ remp2    | 2 | test2
+(6 rows)
+
+select tableoid::regclass, * FROM remp1;
+ tableoid | a |   b   
+----------+---+-------
+ remp1    | 1 | foo
+ remp1    | 1 | bar
+ remp1    | 1 | test1
+(3 rows)
+
+select tableoid::regclass, * FROM remp2;
+ tableoid |   b   | a 
+----------+-------+---
+ remp2    | baz   | 2
+ remp2    | qux   | 2
+ remp2    | test2 | 2
+(3 rows)
+
+delete from ctrtest;
+alter server loopback options (drop batch_size);
 drop table ctrtest;
 drop table loct1;
 drop table loct2;
@@ -8768,6 +8801,78 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+-- Test COPY FROM with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+-- Test basic functionality
+copy rem2 from stdin;
+select * from rem2;
+ f1 | f2  
+----+-----
+  1 | foo
+  2 | bar
+  3 | baz
+(3 rows)
+
+delete from rem2;
+-- Test check constraints
+alter table loc2 add constraint loc2_f1positive check (f1 >= 0);
+alter foreign table rem2 add constraint rem2_f1positive check (f1 >= 0);
+-- check constraint is enforced on the remote side, not locally
+copy rem2 from stdin;
+copy rem2 from stdin; -- ERROR
+ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
+DETAIL:  Failing row contains (-1, xyzzy).
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
+COPY rem2
+select * from rem2;
+ f1 | f2  
+----+-----
+  1 | foo
+  2 | bar
+  3 | baz
+(3 rows)
+
+alter foreign table rem2 drop constraint rem2_f1positive;
+alter table loc2 drop constraint loc2_f1positive;
+delete from rem2;
+-- Test remote triggers
+create trigger trig_row_before_insert before insert on loc2
+	for each row execute procedure trig_row_before_insupdate();
+-- The new values are concatenated with ' triggered !'
+copy rem2 from stdin;
+select * from rem2;
+ f1 |       f2        
+----+-----------------
+  1 | foo triggered !
+  2 | bar triggered !
+  3 | baz triggered !
+(3 rows)
+
+drop trigger trig_row_before_insert on loc2;
+delete from rem2;
+create trigger trig_null before insert on loc2
+	for each row execute procedure trig_null();
+-- Nothing happens
+copy rem2 from stdin;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+drop trigger trig_null on loc2;
+delete from rem2;
+-- Check with zero-columns foreign table; batch insert will be disabled
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(3 rows)
+
+delete from rem2;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index dd858aba03..b0b548d30e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2059,6 +2059,15 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		  resultRelInfo->ri_TrigDesc->trig_insert_after_row)))
 		return 1;
 
+	/*
+	 * If the foreign table has no columns, disable batching as the INSERT
+	 * syntax doesn't allow batching multiple empty rows into a zero-column
+	 * table in a single statement.  This is needed for COPY FROM, in which
+	 * case fmstate must be non-NULL.
+	 */
+	if (fmstate && list_length(fmstate->target_attrs) == 0)
+		return 1;
+
 	/*
 	 * Otherwise use the batch size specified for server/table. The number of
 	 * parameters in a batch is limited to 65535 (uint16), so make sure we
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e48ccd286b..8c886d0b29 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2373,6 +2373,28 @@ copy remp1 from stdin;
 
 select tableoid::regclass, * FROM remp1;
 
+delete from ctrtest;
+
+-- Test copy tuple routing with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+
+copy ctrtest from stdin;
+1	foo
+1	bar
+2	baz
+2	qux
+1	test1
+2	test2
+\.
+
+select tableoid::regclass, * FROM ctrtest;
+select tableoid::regclass, * FROM remp1;
+select tableoid::regclass, * FROM remp2;
+
+delete from ctrtest;
+
+alter server loopback options (drop batch_size);
+
 drop table ctrtest;
 drop table loct1;
 drop table loct2;
@@ -2527,6 +2549,86 @@ select * from rem3;
 drop foreign table rem3;
 drop table loc3;
 
+-- Test COPY FROM with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+
+-- Test basic functionality
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+delete from rem2;
+
+-- Test check constraints
+alter table loc2 add constraint loc2_f1positive check (f1 >= 0);
+alter foreign table rem2 add constraint rem2_f1positive check (f1 >= 0);
+
+-- check constraint is enforced on the remote side, not locally
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+copy rem2 from stdin; -- ERROR
+-1	xyzzy
+\.
+select * from rem2;
+
+alter foreign table rem2 drop constraint rem2_f1positive;
+alter table loc2 drop constraint loc2_f1positive;
+
+delete from rem2;
+
+-- Test remote triggers
+create trigger trig_row_before_insert before insert on loc2
+	for each row execute procedure trig_row_before_insupdate();
+
+-- The new values are concatenated with ' triggered !'
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+drop trigger trig_row_before_insert on loc2;
+
+delete from rem2;
+
+create trigger trig_null before insert on loc2
+	for each row execute procedure trig_null();
+
+-- Nothing happens
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+drop trigger trig_null on loc2;
+
+delete from rem2;
+
+-- Check with zero-columns foreign table; batch insert will be disabled
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+
+\.
+select * from rem2;
+
+delete from rem2;
+
+alter server loopback options (drop batch_size);
+
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index d0b5951019..94263c628f 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -665,7 +665,9 @@ ExecForeignBatchInsert(EState *estate,
 
     <para>
      Note that this function is also called when inserting routed tuples into
-     a foreign-table partition.  See the callback functions
+     a foreign-table partition or executing <command>COPY FROM</command> on
+     a foreign table, in which case it is called in a different way than it
+     is in the <command>INSERT</command> case.  See the callback functions
      described below that allow the FDW to support that.
     </para>
 
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bfd344cdc0..527f4deaaa 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -398,6 +398,10 @@ OPTIONS (ADD password_required 'false');
        exceeds the limit, the <literal>batch_size</literal> will be adjusted to
        avoid an error.
       </para>
+
+      <para>
+       This option also applies when copying into foreign tables.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 175aa837f2..2fe4e0067e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -116,6 +116,12 @@ CopyFromErrorCallback(void *arg)
 {
 	CopyFromState cstate = (CopyFromState) arg;
 
+	if (cstate->relname_only)
+	{
+		errcontext("COPY %s",
+				   cstate->cur_relname);
+		return;
+	}
 	if (cstate->opts.binary)
 	{
 		/* can't usefully display the data */
@@ -222,7 +228,7 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
 	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
 	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
+	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -299,83 +305,164 @@ CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
  */
 static inline void
 CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
+						   CopyMultiInsertBuffer *buffer,
+						   int64 *processed)
 {
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
 	CopyFromState cstate = miinfo->cstate;
 	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
 	int			nused = buffer->nused;
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
+	int			i;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fails.
-	 */
-	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
+	if (resultRelInfo->ri_FdwRoutine)
+	{
+		int			batch_size = resultRelInfo->ri_BatchSize;
+		int			sent = 0;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+		/* Ensure that the FDW supports batching and it's enabled */
+		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
+		Assert(batch_size > 1);
 
-	for (i = 0; i < nused; i++)
-	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * We suppress error context information other than the relation name,
+		 * if one of the operations below fails.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		Assert(!cstate->relname_only);
+		cstate->relname_only = true;
+
+		while (sent < nused)
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			int			size = (batch_size < nused - sent) ? batch_size : (nused - sent);
+			int			inserted = size;
+			TupleTableSlot **rslots;
+
+			/* Batch insert into foreign table */
+			rslots =
+				resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																	 resultRelInfo,
+																	 &slots[sent],
+																	 NULL,
+																	 &inserted);
+
+			sent += size;
+
+			/* No need to do anything if there are no rows inserted */
+			if (inserted <= 0)
+				continue;
+
+			/* Run AFTER ROW INSERT triggers */
+			if (resultRelInfo->ri_TrigDesc != NULL &&
+				(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+				 resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+				for (i = 0; i < inserted; i++)
+				{
+					TupleTableSlot *slot = rslots[i];
+
+					/*
+					 * AFTER ROW Triggers might reference the tableoid column,
+					 * so (re-)initialize tts_tableOid before evaluating them.
+					 */
+					slot->tts_tableOid = relid;
+
+					ExecARInsertTriggers(estate, resultRelInfo,
+										 slot, NIL,
+										 cstate->transition_capture);
+				}
+			}
+
+			/* Update the row counter and progress of the COPY command */
+			*processed += inserted;
+			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+										 *processed);
 		}
 
+		for (i = 0; i < nused; i++)
+			ExecClearTuple(slots[i]);
+
+		/* reset relname_only */
+		cstate->relname_only = false;
+	}
+	else
+	{
+		CommandId	mycid = miinfo->mycid;
+		int			ti_options = miinfo->ti_options;
+		bool		line_buf_valid = cstate->line_buf_valid;
+		uint64		save_cur_lineno = cstate->cur_lineno;
+		MemoryContext oldcontext;
+
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * Print error context information correctly, if one of the operations
+		 * below fails.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		cstate->line_buf_valid = false;
+
+		/*
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
+		 */
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			/*
+			 * If there are any indexes, update them for all the inserted
+			 * tuples, and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false,
+										  false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL,
+									 cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
 		}
 
-		ExecClearTuple(slots[i]);
+		/* Update the row counter and progress of the COPY command */
+		*processed += nused;
+		pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+									 *processed);
+
+		/* reset cur_lineno and line_buf_valid to what they were */
+		cstate->line_buf_valid = line_buf_valid;
+		cstate->cur_lineno = save_cur_lineno;
 	}
 
 	/* Mark that all slots are free */
 	buffer->nused = 0;
-
-	/* reset cur_lineno and line_buf_valid to what they were */
-	cstate->line_buf_valid = line_buf_valid;
-	cstate->cur_lineno = save_cur_lineno;
 }
 
 /*
@@ -387,22 +474,25 @@ static inline void
 CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 							 CopyMultiInsertBuffer *buffer)
 {
+	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	int			i;
 
 	/* Ensure buffer was flushed */
 	Assert(buffer->nused == 0);
 
 	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
-	FreeBulkInsertState(buffer->bistate);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+		FreeBulkInsertState(buffer->bistate);
 
 	/* Since we only create slots on demand, just drop the non-null ones. */
 	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
 		ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
+								 miinfo->ti_options);
 
 	pfree(buffer);
 }
@@ -418,7 +508,8 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
  * 'curr_rri'.
  */
 static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
+CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri,
+						 int64 *processed)
 {
 	ListCell   *lc;
 
@@ -426,7 +517,7 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	{
 		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		CopyMultiInsertBufferFlush(miinfo, buffer, processed);
 	}
 
 	miinfo->bufferedTuples = 0;
@@ -679,6 +770,23 @@ CopyFrom(CopyFromState cstate)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	/*
+	 * Also, if the named relation is a foreign table, determine if the FDW
+	 * supports batch insert and determine the batch size (a FDW may support
+	 * batching, but it may be disabled for the server/table).
+	 *
+	 * If the FDW does not support batching, we set the batch size to 1.
+	 */
+	if (resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -708,10 +816,11 @@ CopyFrom(CopyFromState cstate)
 
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
+	 * insertion, and insert them in one
+	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
+	 * However, there are a number of reasons why we might not be able to do
+	 * this.  These are explained below.
 	 */
 	if (resultRelInfo->ri_TrigDesc != NULL &&
 		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
@@ -725,6 +834,15 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_BatchSize == 1)
+	{
+		/*
+		 * Can't support multi-inserts to a foreign table if the FDW does not
+		 * support batching, or it's disabled for the server or foreign table.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
 			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
 	{
@@ -737,14 +855,12 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support multi-inserts if there are any volatile default
+		 * expressions in the table.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -767,13 +883,14 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables, we may still be able to perform bulk
 		 * inserts.  However, the possibility of this depends on which types
 		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
+		 * if the partition is a foreign table that can't use batching or it
+		 * has any before row insert or insert instead triggers (same as we
+		 * checked above for the parent table).  Since the partition's
+		 * resultRelInfos are initialized only when we actually need to insert
+		 * the first tuple into them, we must have the intermediate insert
+		 * method of CIM_MULTI_CONDITIONAL to flag that we must later
+		 * determine if we can use bulk-inserts for the partition being
+		 * inserted into.
 		 */
 		if (proute)
 			insertMethod = CIM_MULTI_CONDITIONAL;
@@ -910,12 +1027,14 @@ CopyFrom(CopyFromState cstate)
 
 				/*
 				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
+				 * OF triggers, or if the partition is a foreign table that
+				 * can't use batching.
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
 					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					(resultRelInfo->ri_FdwRoutine == NULL ||
+					 resultRelInfo->ri_BatchSize > 1);
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -931,7 +1050,9 @@ CopyFrom(CopyFromState cstate)
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertInfoFlush(&multiInsertInfo,
+											 resultRelInfo,
+											 &processed);
 				}
 
 				if (bistate != NULL)
@@ -1067,7 +1188,17 @@ CopyFrom(CopyFromState cstate)
 					 * buffers out to their tables.
 					 */
 					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+						CopyMultiInsertInfoFlush(&multiInsertInfo,
+												 resultRelInfo,
+												 &processed);
+
+					/*
+					 * We delay updating the row counter and progress of the
+					 * COPY command until after writing the tuples stored in
+					 * the buffer out to the table.  See
+					 * CopyMultiInsertBufferFlush().
+					 */
+					continue;	/* next tuple please */
 				}
 				else
 				{
@@ -1130,7 +1261,7 @@ CopyFrom(CopyFromState cstate)
 	if (insertMethod != CIM_SINGLE)
 	{
 		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
+			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
 	}
 
 	/* Done, clean up */
@@ -1348,6 +1479,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_lineno = 0;
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
+	cstate->relname_only = false;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37c6032ae..21e8b89baa 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -81,6 +81,7 @@ typedef struct CopyFromStateData
 	uint64		cur_lineno;		/* line number for error messages */
 	const char *cur_attname;	/* current att for error messages */
 	const char *cur_attval;		/* current att value for error messages */
+	bool		relname_only;	/* don't output line number, att, etc. */
 
 	/*
 	 * Working state
#125Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Etsuro Fujita (#124)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

I will review the patch a bit more, but I feel that it is
in good shape.

One thing I noticed is this bit added to CopyMultiInsertBufferFlush()
to run triggers on the foreign table.

+           /* Run AFTER ROW INSERT triggers */
+           if (resultRelInfo->ri_TrigDesc != NULL &&
+               (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+                resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+           {
+               Oid         relid =
RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+               for (i = 0; i < inserted; i++)
+               {
+                   TupleTableSlot *slot = rslots[i];
+
+                   /*
+                    * AFTER ROW Triggers might reference the tableoid column,
+                    * so (re-)initialize tts_tableOid before evaluating them.
+                    */
+                   slot->tts_tableOid = relid;
+
+                   ExecARInsertTriggers(estate, resultRelInfo,
+                                        slot, NIL,
+                                        cstate->transition_capture);
+               }
+           }

Since foreign tables cannot have transition tables, we have
trig_insert_new_table=false. So I simplified the if test and added an
assertion ensuring trig_insert_new_table=false. Attached is a new
version of the patch. I tweaked some comments a bit as well. I think
the patch is committable. So I plan on committing it next week if
there are no objections.

Best regards,
Etsuro Fujita

Attachments:

v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-4.patchapplication/octet-stream; name=v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-4.patchDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2e4e82a94f..ffff8d066a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8605,6 +8605,39 @@ select tableoid::regclass, * FROM remp1;
  remp1    | 1 | bar
 (2 rows)
 
+delete from ctrtest;
+-- Test copy tuple routing with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+copy ctrtest from stdin;
+select tableoid::regclass, * FROM ctrtest;
+ tableoid | a |   b   
+----------+---+-------
+ remp1    | 1 | foo
+ remp1    | 1 | bar
+ remp1    | 1 | test1
+ remp2    | 2 | baz
+ remp2    | 2 | qux
+ remp2    | 2 | test2
+(6 rows)
+
+select tableoid::regclass, * FROM remp1;
+ tableoid | a |   b   
+----------+---+-------
+ remp1    | 1 | foo
+ remp1    | 1 | bar
+ remp1    | 1 | test1
+(3 rows)
+
+select tableoid::regclass, * FROM remp2;
+ tableoid |   b   | a 
+----------+-------+---
+ remp2    | baz   | 2
+ remp2    | qux   | 2
+ remp2    | test2 | 2
+(3 rows)
+
+delete from ctrtest;
+alter server loopback options (drop batch_size);
 drop table ctrtest;
 drop table loct1;
 drop table loct2;
@@ -8768,6 +8801,78 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+-- Test COPY FROM with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+-- Test basic functionality
+copy rem2 from stdin;
+select * from rem2;
+ f1 | f2  
+----+-----
+  1 | foo
+  2 | bar
+  3 | baz
+(3 rows)
+
+delete from rem2;
+-- Test check constraints
+alter table loc2 add constraint loc2_f1positive check (f1 >= 0);
+alter foreign table rem2 add constraint rem2_f1positive check (f1 >= 0);
+-- check constraint is enforced on the remote side, not locally
+copy rem2 from stdin;
+copy rem2 from stdin; -- ERROR
+ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
+DETAIL:  Failing row contains (-1, xyzzy).
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
+COPY rem2
+select * from rem2;
+ f1 | f2  
+----+-----
+  1 | foo
+  2 | bar
+  3 | baz
+(3 rows)
+
+alter foreign table rem2 drop constraint rem2_f1positive;
+alter table loc2 drop constraint loc2_f1positive;
+delete from rem2;
+-- Test remote triggers
+create trigger trig_row_before_insert before insert on loc2
+	for each row execute procedure trig_row_before_insupdate();
+-- The new values are concatenated with ' triggered !'
+copy rem2 from stdin;
+select * from rem2;
+ f1 |       f2        
+----+-----------------
+  1 | foo triggered !
+  2 | bar triggered !
+  3 | baz triggered !
+(3 rows)
+
+drop trigger trig_row_before_insert on loc2;
+delete from rem2;
+create trigger trig_null before insert on loc2
+	for each row execute procedure trig_null();
+-- Nothing happens
+copy rem2 from stdin;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+drop trigger trig_null on loc2;
+delete from rem2;
+-- Check with zero-column foreign table; batch insert will be disabled
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(3 rows)
+
+delete from rem2;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index dd858aba03..b0b548d30e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2059,6 +2059,15 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		  resultRelInfo->ri_TrigDesc->trig_insert_after_row)))
 		return 1;
 
+	/*
+	 * If the foreign table has no columns, disable batching as the INSERT
+	 * syntax doesn't allow batching multiple empty rows into a zero-column
+	 * table in a single statement.  This is needed for COPY FROM, in which
+	 * case fmstate must be non-NULL.
+	 */
+	if (fmstate && list_length(fmstate->target_attrs) == 0)
+		return 1;
+
 	/*
 	 * Otherwise use the batch size specified for server/table. The number of
 	 * parameters in a batch is limited to 65535 (uint16), so make sure we
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e48ccd286b..1962051e54 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2373,6 +2373,28 @@ copy remp1 from stdin;
 
 select tableoid::regclass, * FROM remp1;
 
+delete from ctrtest;
+
+-- Test copy tuple routing with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+
+copy ctrtest from stdin;
+1	foo
+1	bar
+2	baz
+2	qux
+1	test1
+2	test2
+\.
+
+select tableoid::regclass, * FROM ctrtest;
+select tableoid::regclass, * FROM remp1;
+select tableoid::regclass, * FROM remp2;
+
+delete from ctrtest;
+
+alter server loopback options (drop batch_size);
+
 drop table ctrtest;
 drop table loct1;
 drop table loct2;
@@ -2527,6 +2549,86 @@ select * from rem3;
 drop foreign table rem3;
 drop table loc3;
 
+-- Test COPY FROM with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+
+-- Test basic functionality
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+delete from rem2;
+
+-- Test check constraints
+alter table loc2 add constraint loc2_f1positive check (f1 >= 0);
+alter foreign table rem2 add constraint rem2_f1positive check (f1 >= 0);
+
+-- check constraint is enforced on the remote side, not locally
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+copy rem2 from stdin; -- ERROR
+-1	xyzzy
+\.
+select * from rem2;
+
+alter foreign table rem2 drop constraint rem2_f1positive;
+alter table loc2 drop constraint loc2_f1positive;
+
+delete from rem2;
+
+-- Test remote triggers
+create trigger trig_row_before_insert before insert on loc2
+	for each row execute procedure trig_row_before_insupdate();
+
+-- The new values are concatenated with ' triggered !'
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+drop trigger trig_row_before_insert on loc2;
+
+delete from rem2;
+
+create trigger trig_null before insert on loc2
+	for each row execute procedure trig_null();
+
+-- Nothing happens
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+drop trigger trig_null on loc2;
+
+delete from rem2;
+
+-- Check with zero-column foreign table; batch insert will be disabled
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+
+\.
+select * from rem2;
+
+delete from rem2;
+
+alter server loopback options (drop batch_size);
+
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index d0b5951019..94263c628f 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -665,7 +665,9 @@ ExecForeignBatchInsert(EState *estate,
 
     <para>
      Note that this function is also called when inserting routed tuples into
-     a foreign-table partition.  See the callback functions
+     a foreign-table partition or executing <command>COPY FROM</command> on
+     a foreign table, in which case it is called in a different way than it
+     is in the <command>INSERT</command> case.  See the callback functions
      described below that allow the FDW to support that.
     </para>
 
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bfd344cdc0..527f4deaaa 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -398,6 +398,10 @@ OPTIONS (ADD password_required 'false');
        exceeds the limit, the <literal>batch_size</literal> will be adjusted to
        avoid an error.
       </para>
+
+      <para>
+       This option also applies when copying into foreign tables.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 175aa837f2..12de3255da 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -116,6 +116,12 @@ CopyFromErrorCallback(void *arg)
 {
 	CopyFromState cstate = (CopyFromState) arg;
 
+	if (cstate->relname_only)
+	{
+		errcontext("COPY %s",
+				   cstate->cur_relname);
+		return;
+	}
 	if (cstate->opts.binary)
 	{
 		/* can't usefully display the data */
@@ -222,7 +228,7 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
 	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
 	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
+	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -299,83 +305,167 @@ CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
  */
 static inline void
 CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
+						   CopyMultiInsertBuffer *buffer,
+						   int64 *processed)
 {
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
 	CopyFromState cstate = miinfo->cstate;
 	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
 	int			nused = buffer->nused;
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
+	int			i;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fails.
-	 */
-	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
+	if (resultRelInfo->ri_FdwRoutine)
+	{
+		int			batch_size = resultRelInfo->ri_BatchSize;
+		int			sent = 0;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+		/* Ensure that the FDW supports batching and it's enabled */
+		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
+		Assert(batch_size > 1);
 
-	for (i = 0; i < nused; i++)
-	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * We suppress error context information other than the relation name,
+		 * if one of the operations below fails.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		Assert(!cstate->relname_only);
+		cstate->relname_only = true;
+
+		while (sent < nused)
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			int			size = (batch_size < nused - sent) ? batch_size : (nused - sent);
+			int			inserted = size;
+			TupleTableSlot **rslots;
+
+			/* insert into foreign table: let the FDW do it */
+			rslots =
+				resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																	 resultRelInfo,
+																	 &slots[sent],
+																	 NULL,
+																	 &inserted);
+
+			sent += size;
+
+			/* No need to do anything if there are no rows inserted */
+			if (inserted <= 0)
+				continue;
+
+			/* Triggers on foreign tables should not have transition tables */
+			Assert(resultRelInfo->ri_TrigDesc == NULL ||
+				   resultRelInfo->ri_TrigDesc->trig_insert_new_table == false);
+
+			/* Run AFTER ROW INSERT triggers */
+			if (resultRelInfo->ri_TrigDesc != NULL &&
+				resultRelInfo->ri_TrigDesc->trig_insert_after_row)
+			{
+				Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+				for (i = 0; i < inserted; i++)
+				{
+					TupleTableSlot *slot = rslots[i];
+
+					/*
+					 * AFTER ROW Triggers might reference the tableoid column,
+					 * so (re-)initialize tts_tableOid before evaluating them.
+					 */
+					slot->tts_tableOid = relid;
+
+					ExecARInsertTriggers(estate, resultRelInfo,
+										 slot, NIL,
+										 cstate->transition_capture);
+				}
+			}
+
+			/* Update the row counter and progress of the COPY command */
+			*processed += inserted;
+			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+										 *processed);
 		}
 
+		for (i = 0; i < nused; i++)
+			ExecClearTuple(slots[i]);
+
+		/* reset relname_only */
+		cstate->relname_only = false;
+	}
+	else
+	{
+		CommandId	mycid = miinfo->mycid;
+		int			ti_options = miinfo->ti_options;
+		bool		line_buf_valid = cstate->line_buf_valid;
+		uint64		save_cur_lineno = cstate->cur_lineno;
+		MemoryContext oldcontext;
+
+		/*
+		 * Print error context information correctly, if one of the operations
+		 * below fails.
+		 */
+		cstate->line_buf_valid = false;
+
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			/*
+			 * If there are any indexes, update them for all the inserted
+			 * tuples, and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false,
+										  false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL,
+									 cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
 		}
 
-		ExecClearTuple(slots[i]);
+		/* Update the row counter and progress of the COPY command */
+		*processed += nused;
+		pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+									 *processed);
+
+		/* reset cur_lineno and line_buf_valid to what they were */
+		cstate->line_buf_valid = line_buf_valid;
+		cstate->cur_lineno = save_cur_lineno;
 	}
 
 	/* Mark that all slots are free */
 	buffer->nused = 0;
-
-	/* reset cur_lineno and line_buf_valid to what they were */
-	cstate->line_buf_valid = line_buf_valid;
-	cstate->cur_lineno = save_cur_lineno;
 }
 
 /*
@@ -387,22 +477,25 @@ static inline void
 CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 							 CopyMultiInsertBuffer *buffer)
 {
+	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	int			i;
 
 	/* Ensure buffer was flushed */
 	Assert(buffer->nused == 0);
 
 	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
-	FreeBulkInsertState(buffer->bistate);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+		FreeBulkInsertState(buffer->bistate);
 
 	/* Since we only create slots on demand, just drop the non-null ones. */
 	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
 		ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
+								 miinfo->ti_options);
 
 	pfree(buffer);
 }
@@ -418,7 +511,8 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
  * 'curr_rri'.
  */
 static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
+CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri,
+						 int64 *processed)
 {
 	ListCell   *lc;
 
@@ -426,7 +520,7 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	{
 		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		CopyMultiInsertBufferFlush(miinfo, buffer, processed);
 	}
 
 	miinfo->bufferedTuples = 0;
@@ -679,6 +773,23 @@ CopyFrom(CopyFromState cstate)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	/*
+	 * Also, if the named relation is a foreign table, determine if the FDW
+	 * supports batch insert and determine the batch size (a FDW may support
+	 * batching, but it may be disabled for the server/table).
+	 *
+	 * If the FDW does not support batching, we set the batch size to 1.
+	 */
+	if (resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -708,10 +819,11 @@ CopyFrom(CopyFromState cstate)
 
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
+	 * insertion, and insert them in one
+	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
+	 * However, there are a number of reasons why we might not be able to do
+	 * this.  These are explained below.
 	 */
 	if (resultRelInfo->ri_TrigDesc != NULL &&
 		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
@@ -725,6 +837,15 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_BatchSize == 1)
+	{
+		/*
+		 * Can't support multi-inserts to a foreign table if the FDW does not
+		 * support batching, or it's disabled for the server or foreign table.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
 			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
 	{
@@ -737,14 +858,12 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support multi-inserts if there are any volatile default
+		 * expressions in the table.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -767,13 +886,14 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables, we may still be able to perform bulk
 		 * inserts.  However, the possibility of this depends on which types
 		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
+		 * if the partition is a foreign table that can't use batching or it
+		 * has any before row insert or insert instead triggers (same as we
+		 * checked above for the parent table).  Since the partition's
+		 * resultRelInfos are initialized only when we actually need to insert
+		 * the first tuple into them, we must have the intermediate insert
+		 * method of CIM_MULTI_CONDITIONAL to flag that we must later
+		 * determine if we can use bulk-inserts for the partition being
+		 * inserted into.
 		 */
 		if (proute)
 			insertMethod = CIM_MULTI_CONDITIONAL;
@@ -910,12 +1030,14 @@ CopyFrom(CopyFromState cstate)
 
 				/*
 				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
+				 * OF triggers, or if the partition is a foreign table that
+				 * can't use batching.
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
 					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					(resultRelInfo->ri_FdwRoutine == NULL ||
+					 resultRelInfo->ri_BatchSize > 1);
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -931,7 +1053,9 @@ CopyFrom(CopyFromState cstate)
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertInfoFlush(&multiInsertInfo,
+											 resultRelInfo,
+											 &processed);
 				}
 
 				if (bistate != NULL)
@@ -1067,7 +1191,17 @@ CopyFrom(CopyFromState cstate)
 					 * buffers out to their tables.
 					 */
 					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+						CopyMultiInsertInfoFlush(&multiInsertInfo,
+												 resultRelInfo,
+												 &processed);
+
+					/*
+					 * We delay updating the row counter and progress of the
+					 * COPY command until after writing the tuples stored in
+					 * the buffer out to the table, as in single insert mode.
+					 * See CopyMultiInsertBufferFlush().
+					 */
+					continue;	/* next tuple please */
 				}
 				else
 				{
@@ -1130,7 +1264,7 @@ CopyFrom(CopyFromState cstate)
 	if (insertMethod != CIM_SINGLE)
 	{
 		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
+			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
 	}
 
 	/* Done, clean up */
@@ -1348,6 +1482,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_lineno = 0;
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
+	cstate->relname_only = false;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37c6032ae..21e8b89baa 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -81,6 +81,7 @@ typedef struct CopyFromStateData
 	uint64		cur_lineno;		/* line number for error messages */
 	const char *cur_attname;	/* current att for error messages */
 	const char *cur_attval;		/* current att value for error messages */
+	bool		relname_only;	/* don't output line number, att, etc. */
 
 	/*
 	 * Working state
#126Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#125)
Re: Fast COPY FROM based on batch insert

On 10/7/22 11:18, Etsuro Fujita wrote:

On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

I will review the patch a bit more, but I feel that it is
in good shape.

One thing I noticed is this bit added to CopyMultiInsertBufferFlush()
to run triggers on the foreign table.

+           /* Run AFTER ROW INSERT triggers */
+           if (resultRelInfo->ri_TrigDesc != NULL &&
+               (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+                resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+           {
+               Oid         relid =
RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+               for (i = 0; i < inserted; i++)
+               {
+                   TupleTableSlot *slot = rslots[i];
+
+                   /*
+                    * AFTER ROW Triggers might reference the tableoid column,
+                    * so (re-)initialize tts_tableOid before evaluating them.
+                    */
+                   slot->tts_tableOid = relid;
+
+                   ExecARInsertTriggers(estate, resultRelInfo,
+                                        slot, NIL,
+                                        cstate->transition_capture);
+               }
+           }

Since foreign tables cannot have transition tables, we have
trig_insert_new_table=false. So I simplified the if test and added an
assertion ensuring trig_insert_new_table=false. Attached is a new
version of the patch. I tweaked some comments a bit as well. I think
the patch is committable. So I plan on committing it next week if
there are no objections.

I reviewed the patch one more time. Only one question: bistate and
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.

--
Regards
Andrey Lepikhov
Postgres Professional

#127Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#126)
1 attachment(s)
Re: Fast COPY FROM based on batch insert

On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

I reviewed the patch one more time. Only one question: bistate and
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.

You mean the bistate member of CopyMultiInsertBuffer?

We do not use that member at all for foreign tables, so the patch
avoids initializing that member in CopyMultiInsertBufferInit() when
called for a foreign table. So we have bistate = NULL for foreign
tables (and bistate != NULL for plain tables), as you mentioned above.
I think it is a good idea to add such assertions. How about adding
them to CopyMultiInsertBufferFlush() and
CopyMultiInsertBufferCleanup() like the attached? In the attached I
updated comments a bit further as well.

Thanks for reviewing!

Best regards,
Etsuro Fujita

Attachments:

v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-5.patchapplication/octet-stream; name=v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-5.patchDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index cc9e39c4a5..9746998751 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8608,6 +8608,39 @@ select tableoid::regclass, * FROM remp1;
  remp1    | 1 | bar
 (2 rows)
 
+delete from ctrtest;
+-- Test copy tuple routing with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+copy ctrtest from stdin;
+select tableoid::regclass, * FROM ctrtest;
+ tableoid | a |   b   
+----------+---+-------
+ remp1    | 1 | foo
+ remp1    | 1 | bar
+ remp1    | 1 | test1
+ remp2    | 2 | baz
+ remp2    | 2 | qux
+ remp2    | 2 | test2
+(6 rows)
+
+select tableoid::regclass, * FROM remp1;
+ tableoid | a |   b   
+----------+---+-------
+ remp1    | 1 | foo
+ remp1    | 1 | bar
+ remp1    | 1 | test1
+(3 rows)
+
+select tableoid::regclass, * FROM remp2;
+ tableoid |   b   | a 
+----------+-------+---
+ remp2    | baz   | 2
+ remp2    | qux   | 2
+ remp2    | test2 | 2
+(3 rows)
+
+delete from ctrtest;
+alter server loopback options (drop batch_size);
 drop table ctrtest;
 drop table loct1;
 drop table loct2;
@@ -8771,6 +8804,78 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+-- Test COPY FROM with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+-- Test basic functionality
+copy rem2 from stdin;
+select * from rem2;
+ f1 | f2  
+----+-----
+  1 | foo
+  2 | bar
+  3 | baz
+(3 rows)
+
+delete from rem2;
+-- Test check constraints
+alter table loc2 add constraint loc2_f1positive check (f1 >= 0);
+alter foreign table rem2 add constraint rem2_f1positive check (f1 >= 0);
+-- check constraint is enforced on the remote side, not locally
+copy rem2 from stdin;
+copy rem2 from stdin; -- ERROR
+ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
+DETAIL:  Failing row contains (-1, xyzzy).
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
+COPY rem2
+select * from rem2;
+ f1 | f2  
+----+-----
+  1 | foo
+  2 | bar
+  3 | baz
+(3 rows)
+
+alter foreign table rem2 drop constraint rem2_f1positive;
+alter table loc2 drop constraint loc2_f1positive;
+delete from rem2;
+-- Test remote triggers
+create trigger trig_row_before_insert before insert on loc2
+	for each row execute procedure trig_row_before_insupdate();
+-- The new values are concatenated with ' triggered !'
+copy rem2 from stdin;
+select * from rem2;
+ f1 |       f2        
+----+-----------------
+  1 | foo triggered !
+  2 | bar triggered !
+  3 | baz triggered !
+(3 rows)
+
+drop trigger trig_row_before_insert on loc2;
+delete from rem2;
+create trigger trig_null before insert on loc2
+	for each row execute procedure trig_null();
+-- Nothing happens
+copy rem2 from stdin;
+select * from rem2;
+ f1 | f2 
+----+----
+(0 rows)
+
+drop trigger trig_null on loc2;
+delete from rem2;
+-- Check with zero-column foreign table; batch insert will be disabled
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(3 rows)
+
+delete from rem2;
+alter server loopback options (drop batch_size);
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8d013f5b1a..d98709e5e8 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2057,6 +2057,15 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		  resultRelInfo->ri_TrigDesc->trig_insert_after_row)))
 		return 1;
 
+	/*
+	 * If the foreign table has no columns, disable batching as the INSERT
+	 * syntax doesn't allow batching multiple empty rows into a zero-column
+	 * table in a single statement.  This is needed for COPY FROM, in which
+	 * case fmstate must be non-NULL.
+	 */
+	if (fmstate && list_length(fmstate->target_attrs) == 0)
+		return 1;
+
 	/*
 	 * Otherwise use the batch size specified for server/table. The number of
 	 * parameters in a batch is limited to 65535 (uint16), so make sure we
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e48ccd286b..1962051e54 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2373,6 +2373,28 @@ copy remp1 from stdin;
 
 select tableoid::regclass, * FROM remp1;
 
+delete from ctrtest;
+
+-- Test copy tuple routing with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+
+copy ctrtest from stdin;
+1	foo
+1	bar
+2	baz
+2	qux
+1	test1
+2	test2
+\.
+
+select tableoid::regclass, * FROM ctrtest;
+select tableoid::regclass, * FROM remp1;
+select tableoid::regclass, * FROM remp2;
+
+delete from ctrtest;
+
+alter server loopback options (drop batch_size);
+
 drop table ctrtest;
 drop table loct1;
 drop table loct2;
@@ -2527,6 +2549,86 @@ select * from rem3;
 drop foreign table rem3;
 drop table loc3;
 
+-- Test COPY FROM with the batch_size option enabled
+alter server loopback options (add batch_size '2');
+
+-- Test basic functionality
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+delete from rem2;
+
+-- Test check constraints
+alter table loc2 add constraint loc2_f1positive check (f1 >= 0);
+alter foreign table rem2 add constraint rem2_f1positive check (f1 >= 0);
+
+-- check constraint is enforced on the remote side, not locally
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+copy rem2 from stdin; -- ERROR
+-1	xyzzy
+\.
+select * from rem2;
+
+alter foreign table rem2 drop constraint rem2_f1positive;
+alter table loc2 drop constraint loc2_f1positive;
+
+delete from rem2;
+
+-- Test remote triggers
+create trigger trig_row_before_insert before insert on loc2
+	for each row execute procedure trig_row_before_insupdate();
+
+-- The new values are concatenated with ' triggered !'
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+drop trigger trig_row_before_insert on loc2;
+
+delete from rem2;
+
+create trigger trig_null before insert on loc2
+	for each row execute procedure trig_null();
+
+-- Nothing happens
+copy rem2 from stdin;
+1	foo
+2	bar
+3	baz
+\.
+select * from rem2;
+
+drop trigger trig_null on loc2;
+
+delete from rem2;
+
+-- Check with zero-column foreign table; batch insert will be disabled
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+
+
+
+\.
+select * from rem2;
+
+delete from rem2;
+
+alter server loopback options (drop batch_size);
+
 -- ===================================================================
 -- test for TRUNCATE
 -- ===================================================================
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index d0b5951019..94263c628f 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -665,7 +665,9 @@ ExecForeignBatchInsert(EState *estate,
 
     <para>
      Note that this function is also called when inserting routed tuples into
-     a foreign-table partition.  See the callback functions
+     a foreign-table partition or executing <command>COPY FROM</command> on
+     a foreign table, in which case it is called in a different way than it
+     is in the <command>INSERT</command> case.  See the callback functions
      described below that allow the FDW to support that.
     </para>
 
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bfd344cdc0..527f4deaaa 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -398,6 +398,10 @@ OPTIONS (ADD password_required 'false');
        exceeds the limit, the <literal>batch_size</literal> will be adjusted to
        avoid an error.
       </para>
+
+      <para>
+       This option also applies when copying into foreign tables.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 175aa837f2..f5af46166a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -78,7 +78,8 @@ typedef struct CopyMultiInsertBuffer
 {
 	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel */
+	BulkInsertState bistate;	/* BulkInsertState for this rel if plain table
+								 * or NULL if foreign table */
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -116,6 +117,12 @@ CopyFromErrorCallback(void *arg)
 {
 	CopyFromState cstate = (CopyFromState) arg;
 
+	if (cstate->relname_only)
+	{
+		errcontext("COPY %s",
+				   cstate->cur_relname);
+		return;
+	}
 	if (cstate->opts.binary)
 	{
 		/* can't usefully display the data */
@@ -222,7 +229,7 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
 	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
 	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
+	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -299,83 +306,171 @@ CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
  */
 static inline void
 CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
+						   CopyMultiInsertBuffer *buffer,
+						   int64 *processed)
 {
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
 	CopyFromState cstate = miinfo->cstate;
 	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
 	int			nused = buffer->nused;
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
+	int			i;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fails.
-	 */
-	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
+	if (resultRelInfo->ri_FdwRoutine)
+	{
+		int			batch_size = resultRelInfo->ri_BatchSize;
+		int			sent = 0;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
+		Assert(buffer->bistate == NULL);
+
+		/* Ensure that the FDW supports batching and it's enabled */
+		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
+		Assert(batch_size > 1);
 
-	for (i = 0; i < nused; i++)
-	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
+		 * We suppress error context information other than the relation name,
+		 * if one of the operations below fails.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		Assert(!cstate->relname_only);
+		cstate->relname_only = true;
+
+		while (sent < nused)
 		{
-			List	   *recheckIndexes;
-
-			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
-			list_free(recheckIndexes);
+			int			size = (batch_size < nused - sent) ? batch_size : (nused - sent);
+			int			inserted = size;
+			TupleTableSlot **rslots;
+
+			/* insert into foreign table: let the FDW do it */
+			rslots =
+				resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																	 resultRelInfo,
+																	 &slots[sent],
+																	 NULL,
+																	 &inserted);
+
+			sent += size;
+
+			/* No need to do anything if there are no inserted rows */
+			if (inserted <= 0)
+				continue;
+
+			/* Triggers on foreign tables should not have transition tables */
+			Assert(resultRelInfo->ri_TrigDesc == NULL ||
+				   resultRelInfo->ri_TrigDesc->trig_insert_new_table == false);
+
+			/* Run AFTER ROW INSERT triggers */
+			if (resultRelInfo->ri_TrigDesc != NULL &&
+				resultRelInfo->ri_TrigDesc->trig_insert_after_row)
+			{
+				Oid			relid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+				for (i = 0; i < inserted; i++)
+				{
+					TupleTableSlot *slot = rslots[i];
+
+					/*
+					 * AFTER ROW Triggers might reference the tableoid column,
+					 * so (re-)initialize tts_tableOid before evaluating them.
+					 */
+					slot->tts_tableOid = relid;
+
+					ExecARInsertTriggers(estate, resultRelInfo,
+										 slot, NIL,
+										 cstate->transition_capture);
+				}
+			}
+
+			/* Update the row counter and progress of the COPY command */
+			*processed += inserted;
+			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+										 *processed);
 		}
 
+		for (i = 0; i < nused; i++)
+			ExecClearTuple(slots[i]);
+
+		/* reset relname_only */
+		cstate->relname_only = false;
+	}
+	else
+	{
+		CommandId	mycid = miinfo->mycid;
+		int			ti_options = miinfo->ti_options;
+		bool		line_buf_valid = cstate->line_buf_valid;
+		uint64		save_cur_lineno = cstate->cur_lineno;
+		MemoryContext oldcontext;
+
+		Assert(buffer->bistate != NULL);
+
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
+		 * Print error context information correctly, if one of the operations
+		 * below fails.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		cstate->line_buf_valid = false;
+
+		/*
+		 * table_multi_insert may leak memory, so switch to short-lived memory
+		 * context before calling it.
+		 */
+		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		table_multi_insert(resultRelInfo->ri_RelationDesc,
+						   slots,
+						   nused,
+						   mycid,
+						   ti_options,
+						   buffer->bistate);
+		MemoryContextSwitchTo(oldcontext);
+
+		for (i = 0; i < nused; i++)
 		{
-			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			/*
+			 * If there are any indexes, update them for all the inserted
+			 * tuples, and run AFTER ROW INSERT triggers.
+			 */
+			if (resultRelInfo->ri_NumIndices > 0)
+			{
+				List	   *recheckIndexes;
+
+				cstate->cur_lineno = buffer->linenos[i];
+				recheckIndexes =
+					ExecInsertIndexTuples(resultRelInfo,
+										  buffer->slots[i], estate, false,
+										  false, NULL, NIL);
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], recheckIndexes,
+									 cstate->transition_capture);
+				list_free(recheckIndexes);
+			}
+
+			/*
+			 * There's no indexes, but see if we need to run AFTER ROW INSERT
+			 * triggers anyway.
+			 */
+			else if (resultRelInfo->ri_TrigDesc != NULL &&
+					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+			{
+				cstate->cur_lineno = buffer->linenos[i];
+				ExecARInsertTriggers(estate, resultRelInfo,
+									 slots[i], NIL,
+									 cstate->transition_capture);
+			}
+
+			ExecClearTuple(slots[i]);
 		}
 
-		ExecClearTuple(slots[i]);
+		/* Update the row counter and progress of the COPY command */
+		*processed += nused;
+		pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+									 *processed);
+
+		/* reset cur_lineno and line_buf_valid to what they were */
+		cstate->line_buf_valid = line_buf_valid;
+		cstate->cur_lineno = save_cur_lineno;
 	}
 
 	/* Mark that all slots are free */
 	buffer->nused = 0;
-
-	/* reset cur_lineno and line_buf_valid to what they were */
-	cstate->line_buf_valid = line_buf_valid;
-	cstate->cur_lineno = save_cur_lineno;
 }
 
 /*
@@ -387,22 +482,30 @@ static inline void
 CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 							 CopyMultiInsertBuffer *buffer)
 {
+	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	int			i;
 
 	/* Ensure buffer was flushed */
 	Assert(buffer->nused == 0);
 
 	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
+	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
-	FreeBulkInsertState(buffer->bistate);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+	{
+		Assert(buffer->bistate != NULL);
+		FreeBulkInsertState(buffer->bistate);
+	}
+	else
+		Assert(buffer->bistate == NULL);
 
 	/* Since we only create slots on demand, just drop the non-null ones. */
 	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
 		ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	if (resultRelInfo->ri_FdwRoutine == NULL)
+		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
+								 miinfo->ti_options);
 
 	pfree(buffer);
 }
@@ -418,7 +521,8 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
  * 'curr_rri'.
  */
 static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
+CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri,
+						 int64 *processed)
 {
 	ListCell   *lc;
 
@@ -426,7 +530,7 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	{
 		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		CopyMultiInsertBufferFlush(miinfo, buffer, processed);
 	}
 
 	miinfo->bufferedTuples = 0;
@@ -679,6 +783,23 @@ CopyFrom(CopyFromState cstate)
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
 														 resultRelInfo);
 
+	/*
+	 * Also, if the named relation is a foreign table, determine if the FDW
+	 * supports batch insert and determine the batch size (a FDW may support
+	 * batching, but it may be disabled for the server/table).
+	 *
+	 * If the FDW does not support batching, we set the batch size to 1.
+	 */
+	if (resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* Prepare to catch AFTER triggers. */
 	AfterTriggerBeginQuery();
 
@@ -708,10 +829,11 @@ CopyFrom(CopyFromState cstate)
 
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
-	 * insertion, and insert them in one table_multi_insert() call, than call
-	 * table_tuple_insert() separately for every tuple. However, there are a
-	 * number of reasons why we might not be able to do this.  These are
-	 * explained below.
+	 * insertion, and insert them in one
+	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
+	 * However, there are a number of reasons why we might not be able to do
+	 * this.  These are explained below.
 	 */
 	if (resultRelInfo->ri_TrigDesc != NULL &&
 		(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
@@ -725,6 +847,15 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (resultRelInfo->ri_FdwRoutine != NULL &&
+			 resultRelInfo->ri_BatchSize == 1)
+	{
+		/*
+		 * Can't support multi-inserts to a foreign table if the FDW does not
+		 * support batching, or it's disabled for the server or foreign table.
+		 */
+		insertMethod = CIM_SINGLE;
+	}
 	else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
 			 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
 	{
@@ -737,14 +868,12 @@ CopyFrom(CopyFromState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (resultRelInfo->ri_FdwRoutine != NULL ||
-			 cstate->volatile_defexprs)
+	else if (cstate->volatile_defexprs)
 	{
 		/*
-		 * Can't support multi-inserts to foreign tables or if there are any
-		 * volatile default expressions in the table.  Similarly to the
-		 * trigger case above, such expressions may query the table we're
-		 * inserting into.
+		 * Can't support multi-inserts if there are any volatile default
+		 * expressions in the table.  Similarly to the trigger case above,
+		 * such expressions may query the table we're inserting into.
 		 *
 		 * Note: It does not matter if any partitions have any volatile
 		 * default expressions as we use the defaults from the target of the
@@ -767,13 +896,14 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables, we may still be able to perform bulk
 		 * inserts.  However, the possibility of this depends on which types
 		 * of triggers exist on the partition.  We must disable bulk inserts
-		 * if the partition is a foreign table or it has any before row insert
-		 * or insert instead triggers (same as we checked above for the parent
-		 * table).  Since the partition's resultRelInfos are initialized only
-		 * when we actually need to insert the first tuple into them, we must
-		 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
-		 * flag that we must later determine if we can use bulk-inserts for
-		 * the partition being inserted into.
+		 * if the partition is a foreign table that can't use batching or it
+		 * has any before row insert or insert instead triggers (same as we
+		 * checked above for the parent table).  Since the partition's
+		 * resultRelInfos are initialized only when we actually need to insert
+		 * the first tuple into them, we must have the intermediate insert
+		 * method of CIM_MULTI_CONDITIONAL to flag that we must later
+		 * determine if we can use bulk-inserts for the partition being
+		 * inserted into.
 		 */
 		if (proute)
 			insertMethod = CIM_MULTI_CONDITIONAL;
@@ -910,12 +1040,14 @@ CopyFrom(CopyFromState cstate)
 
 				/*
 				 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-				 * OF triggers, or if the partition is a foreign partition.
+				 * OF triggers, or if the partition is a foreign table that
+				 * can't use batching.
 				 */
 				leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
 					!has_before_insert_row_trig &&
 					!has_instead_insert_row_trig &&
-					resultRelInfo->ri_FdwRoutine == NULL;
+					(resultRelInfo->ri_FdwRoutine == NULL ||
+					 resultRelInfo->ri_BatchSize > 1);
 
 				/* Set the multi-insert buffer to use for this partition. */
 				if (leafpart_use_multi_insert)
@@ -931,7 +1063,9 @@ CopyFrom(CopyFromState cstate)
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertInfoFlush(&multiInsertInfo,
+											 resultRelInfo,
+											 &processed);
 				}
 
 				if (bistate != NULL)
@@ -1067,7 +1201,17 @@ CopyFrom(CopyFromState cstate)
 					 * buffers out to their tables.
 					 */
 					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+						CopyMultiInsertInfoFlush(&multiInsertInfo,
+												 resultRelInfo,
+												 &processed);
+
+					/*
+					 * We delay updating the row counter and progress of the
+					 * COPY command until after writing the tuples stored in
+					 * the buffer out to the table, as in single insert mode.
+					 * See CopyMultiInsertBufferFlush().
+					 */
+					continue;	/* next tuple please */
 				}
 				else
 				{
@@ -1130,7 +1274,7 @@ CopyFrom(CopyFromState cstate)
 	if (insertMethod != CIM_SINGLE)
 	{
 		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
+			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
 	}
 
 	/* Done, clean up */
@@ -1348,6 +1492,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_lineno = 0;
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
+	cstate->relname_only = false;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37c6032ae..8d9cc5accd 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -40,13 +40,16 @@ typedef enum EolType
 } EolType;
 
 /*
- * Represents the heap insert method to be used during COPY FROM.
+ * Represents the insert method to be used during COPY FROM.
  */
 typedef enum CopyInsertMethod
 {
-	CIM_SINGLE,					/* use table_tuple_insert or fdw routine */
-	CIM_MULTI,					/* always use table_multi_insert */
-	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
+	CIM_SINGLE,					/* use table_tuple_insert or
+								 * ExecForeignInsert */
+	CIM_MULTI,					/* always use table_multi_insert or
+								 * ExecForeignBatchInsert */
+	CIM_MULTI_CONDITIONAL		/* use table_multi_insert or
+								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
 /*
@@ -81,6 +84,7 @@ typedef struct CopyFromStateData
 	uint64		cur_lineno;		/* line number for error messages */
 	const char *cur_attname;	/* current att for error messages */
 	const char *cur_attval;		/* current att value for error messages */
+	bool		relname_only;	/* don't output line number, att, etc. */
 
 	/*
 	 * Working state
#128Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#127)
Re: Fast COPY FROM based on batch insert

On 10/12/22 07:56, Etsuro Fujita wrote:

On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

I reviewed the patch one more time. Only one question: bistate and
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.

You mean the bistate member of CopyMultiInsertBuffer?

Yes

We do not use that member at all for foreign tables, so the patch
avoids initializing that member in CopyMultiInsertBufferInit() when
called for a foreign table. So we have bistate = NULL for foreign
tables (and bistate != NULL for plain tables), as you mentioned above.
I think it is a good idea to add such assertions. How about adding
them to CopyMultiInsertBufferFlush() and
CopyMultiInsertBufferCleanup() like the attached? In the attached I
updated comments a bit further as well.

Yes, quite enough.

--
Regards
Andrey Lepikhov
Postgres Professional

#129Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#128)
Re: Fast COPY FROM based on batch insert

On Thu, Oct 13, 2022 at 1:38 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 10/12/22 07:56, Etsuro Fujita wrote:

On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

I reviewed the patch one more time. Only one question: bistate and
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.

You mean the bistate member of CopyMultiInsertBuffer?

Yes

We do not use that member at all for foreign tables, so the patch
avoids initializing that member in CopyMultiInsertBufferInit() when
called for a foreign table. So we have bistate = NULL for foreign
tables (and bistate != NULL for plain tables), as you mentioned above.
I think it is a good idea to add such assertions. How about adding
them to CopyMultiInsertBufferFlush() and
CopyMultiInsertBufferCleanup() like the attached? In the attached I
updated comments a bit further as well.

Yes, quite enough.

I have committed the patch after tweaking comments a little bit further.

Best regards,
Etsuro Fujita

#130Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Etsuro Fujita (#129)
Re: Fast COPY FROM based on batch insert

On Thu, Oct 13, 2022 at 6:58 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

I have committed the patch after tweaking comments a little bit further.

I think there is another patch that improves performance of COPY FROM
for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
else) want to work on it again, I think it would be better to create a
new CF entry for it (and start a new thread for it). So I plan to
close this in the November CF unless they think otherwise.

Anyway, thanks for the patch, Andrey! Thanks for reviewing, Ian and Zhihong!

Best regards,
Etsuro Fujita

#131Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Etsuro Fujita (#130)
Re: Fast COPY FROM based on batch insert

On 28/10/2022 16:12, Etsuro Fujita wrote:

On Thu, Oct 13, 2022 at 6:58 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

I have committed the patch after tweaking comments a little bit further.

I think there is another patch that improves performance of COPY FROM
for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
else) want to work on it again, I think it would be better to create a
new CF entry for it (and start a new thread for it). So I plan to
close this in the November CF unless they think otherwise.

Anyway, thanks for the patch, Andrey! Thanks for reviewing, Ian and Zhihong!

Thanks,

I studied performance of this code in comparison to bulk INSERTions.
This patch seems to improve speed of insertion by about 20%. Also, this
patch is very invasive. So, I don't have any plans to work on it now.

--
regards,
Andrey Lepikhov
Postgres Professional

#132Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andrey Lepikhov (#131)
Re: Fast COPY FROM based on batch insert

On Fri, Oct 28, 2022 at 7:53 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

On 28/10/2022 16:12, Etsuro Fujita wrote:

I think there is another patch that improves performance of COPY FROM
for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
else) want to work on it again, I think it would be better to create a
new CF entry for it (and start a new thread for it). So I plan to
close this in the November CF unless they think otherwise.

I studied performance of this code in comparison to bulk INSERTions.
This patch seems to improve speed of insertion by about 20%. Also, this
patch is very invasive. So, I don't have any plans to work on it now.

Ok, let's leave that for future work. I closed this entry in the November CF.

Best regards,
Etsuro Fujita